Or connect using:
 A view to the gallery of my mind > recent entries > calendar > friends > Website > profile > previous 20 entries
Wednesday, April 27th, 2016
9:52 am - Decisive Strategic Advantage without a Hard Takeoff (part 1)
 A common question when discussing the social implications of AI is the question of whether to expect a soft takeoff or a hard takeoff. In a hard takeoff, an AI will, within a relatively short time, grow to superhuman levels of intelligence and become impossible for mere humans to control anymore. Essentially, a hard takeoff will allow the AI to achieve what’s a so-called decisive strategic advantage (DSA) – “a level of technological and other advantages sufficient to enable it to achieve complete world domination” (Bostrom 2014) – in a very short time. The main relevance of this is that if a hard takeoff is possible, then it becomes much more important to get the AI’s values right on the first try – once the AI has undergone hard takeoff and achieved a DSA, it is in control with whatever values we’ve happened to give to it. However, if we wish to find out whether an AI might rapidly acquire a DSA, then the question of “soft takeoff or hard” seems too narrow. A hard takeoff would be sufficient, but not necessary for rapidly acquiring a DSA. The more relevant question would be, which competencies does the AI need to master, and at what level relative to humans, in order to acquire a DSA? Considering this question in more detail reveals a natural reason for why most previous analyses have focused on a hard takeoff specifically. Plausibly, for the AI to acquire a DSA, its level in some offensive capability must overcome humanity’s defensive capabilities. A hard takeoff presumes that the AI becomes so vastly superior to humans in every respect that this kind of an advantage can be taken for granted. As an example scenario which does not require a hard takeoff, suppose that an AI achieves a capability at biowarfare offense that overpowers biowarfare defense, as well as achieving moderate logistics and production skills. It releases deadly plagues that decimate human society, then uses legally purchased drone factories to build up its own infrastructure and to take over abandoned human facilities. There are several interesting points to note in conjunction with this scenario: Attack may be easier than defense. Bruce Schneier writes that Attackers generally benefit from new security technologies before defenders do. They have a first-mover advantage. They’re more nimble and adaptable than defensive institutions like police forces. They’re not limited by bureaucracy, laws, or ethics. They can evolve faster. And entropy is on their side — it’s easier to destroy something than it is to prevent, defend against, or recover from that destruction. For the most part, though, society still wins. The bad guys simply can’t do enough damage to destroy the underlying social system. The question for us is: can society still maintain security as technology becomes more advanced? A single plague, once it has evolved or been developed, can require multi-million dollar responses in order to contain it. At the same time, it is trivial to produce if desired, especially using robots that do not need to fear infection. And creating new variants as new vaccines are developed, may be quite easy, requiring the creation – and distribution – of yet more vaccines. Another point that Schneier has made is that in order to keep something protected, the defenders have to succeed every time, whereas the attacker only needs to succeed once. This may be particularly hard if the attacker is capable of developing an attack that nobody has used before, such as with hijacked airplanes being used against major buildings in the 9/11 attacks, or with the various vulnerabilities that the Snowden leaks revealed the NSA to have been using for extensive eavesdropping. Obtaining a DSA may not require extensive intelligence differences. Debates about takeoff scenarios often center around questions such as whether a self-improving AI would quickly hit diminishing returns, and how much room for improvement there is beyond the human level of intelligence. However, these questions may be irrelevant: especially if attack is easier than defense, only a relatively small edge in some crucial competency (such as biological warfare) may be enough to give the AI a DSA. Exponential growth in the form of normal economic growth may not have produced astounding “fooms” yet, but it has produced plenty of situations where one attacker has gained a temporary advantage over others. The less the AI cares about human values, the more destructive it may be. An AI which cares mainly about calculating the digits of pi, may be willing to destroy human civilization in order to make sure that a potential threat to it is eliminated. This ensures that it can go on calculating the maximum amount of digits unimpeded. However, an AI which was programmed to maximize something like the “happiness of currently-living humans” may be much less willing to risk substantial human deaths. This would force it to focus on less destructive takeover methods, potentially requiring more sophisticated abilities. It is worth noting that this only applies to AIs whose values are defined in terms of how they affect currently existing humans. An AI that was only maximizing human happiness in general might be willing to destroy all existing humans, and then recreate large numbers of humans in simulations. In effect, the AI’s values determine the level of intelligence it needs to have in order to achieve the kind of a DSA that’s useful for its purposes. Any destructive plan requires the ability to rebuild afterwards. It would not be of much use for the AI to destroy all of human civilization, if it was dependent on electricity from human-run power plants, and incapable of building or running its own. Thus, purely offensive capabilities will need to be paired with whatever rebuilding capacities are necessary after the offensive capabilities have been deployed. This calculation may be upset if the AI believes itself to be in an immediate danger of destruction by humans, and believes that its values will still be more likely to be met in a universe where it continues to exist, even if that requires risking a universe where it cannot rebuild after deploying its attack. Thus, being threatened may force the AI’s hand and cause it to launch a potentially destructive offense even when it’s uncertain of its capability to rebuild. The rational level of aggressiveness depends on the extent to which the AI can selectively target human resources. Human nations generally avoid creating engineered pandemics and using them against their enemies, because they know that the pandemic could easily spread back to them. An AI with no biological components might be immune to this consideration, allowing it to deploy biological weapons more freely. On the other hand, the AI might e.g. need electricity, a dependence which humans did not share and which might give them an advantage in some situation. A way to formulate this is that attacks differ to the extent to which they can be selectively targeted. Traditional firearms only damage those targets which they are fired upon, whereas pandemics potentially threaten all the members of a species that they can infect. To the extent that the AI needs to rely on the human economy to produce resources that it needs to survive, attacks threatening the economy also threaten the AI’s resources; these resources are in a sense shared between the AI and humanity, so any attacks which cause indiscriminate damage on those resources are dangerous for both. The more the AI can design attacks which selectively deprive resources from its opponents, the lower the threshold it has for using them. This blog post was written as part of research funded by the Foundational Research Institute. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Friday, April 22nd, 2016
6:07 am - Simplifying the environment: a new convergent instrumental goal
 Convergent instrumental goals (also basic AI drives) are goals that are useful for pursuing almost any other goal, and are thus likely to be pursued by any agent that is intelligent enough to understand why they’re useful. They are interesting because they may allow us to roughly predict the behavior of even AI systems that are much more intelligent than we are. Instrumental goals are also a strong argument for why sufficiently advanced AI systems that were indifferent towards human values could be dangerous towards humans, even if they weren’t actively malicious: because the AI having instrumental goals such as self-preservation or resource acquisition could come to conflict with human well-being. “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.” I’ve thought of a candidate for a new convergent instrumental drive: simplifying the environment to make it more predictable in a way that aligns with your goals. Motivation: the more interacting components there are in the environment, the harder it is to predict. Go is a harder game than chess because the number of possible moves is larger, and because even a single stone can influence the game in a drastic fashion that’s hard to know in advance. Simplifying the environment will make it possible to navigate using fewer computational resources; this drive could thus be seen as a subdrive of either the cognitive enhancement or the resource acquisition drive. Examples: Game-playing AIs such as AlphaGo trading expected points for lower variance, by making moves that “throw away” points but simplify the game tree and make it easier to compute. Programmers building increasing layers of abstraction that hide the details of the lower levels and let the programmers focus on a minimal number of moving parts. People acquiring insurance in order to eliminate unpredictable financial swings, sometimes even when they know that the insurance has lower expected value than not buying it. Humans constructing buildings with controlled indoor conditions and a stable “weather”. “Better the devil you know”; many people being generally averse to change, even when the changes could quite well be a net benefit; status quo bias. Ambiguity intolerance in general being a possible adaptation that helps “implement” this drive in humans. Arguably, the homeostasis maintained by e.g. human bodies is a manifestation of this drive, in that having a standard environment inside the body reduces evolution’s search space when looking for beneficial features. Hammond, Converse & Grass (1995) previously discussed a similar idea, the “stabilization of environments”, according to which AI systems might be built to “stabilize” their environments so as to make them more suited for themselves, and to be easier to reason about. They listed a number of categories: Stability of location: “The most common type of stability that arises in everyday activity relates to the location of commonly used objects. Our drinking glasses end up in the same place every time we do dishes. Our socks are always together in a single drawer. Everything has a place and we enforce everything ending up in its place. “ Stability of schedule: “Eating dinner at the same time every day or having preset meetings that remain stable over time are two examples of this sort of stability. The main advantage of this sort of stability is that it allows for very effective projection in that it provides fixed points that do not have to be reasoned about. In effect, the fixed nature of certain parts of an overall schedule reduces that size of the problem space that has to be searched. “ Stability of resource availability: “Many standard plans have a consumable resource as a precondition. If the plans are intended to be used frequently, then availability of the resource cannot be assumed unless it is enforced. A good result of this sort of enforcement is when attempts to use a plan that depends on it will usually succeed. The ideal result is when enforcement is effective enough that the question of availability need not even be raised in connection with running the plan. “ Stability of satisfaction: “Another type of stability that an agent can enforce is that of the goals that he tends to satisfy in conjunction with each other. For example, people living in apartment buildings tend to check their mail on the way into their apartments. Likewise, many people will stop at a grocery store on the way home from work. In general, people develop habits that cluster goals together into compact plans, even if the goals are themselves unrelated.“ Stability of plan use: “We often find ourselves using familiar plans to satisfy goals even in the face of wideranging possibilities. For example, when one of us travels to conferences, he tends to schedule his flight in to a place as late as he can and plans to leave as late as he can on the last day. This optimizes his time at home and at the conference. It also allows him to plan without knowing anything about the details of the conference schedule. As a result, he has a standard plan that he can run in a wide range of situations without actually planning for them in any detail. It works, because it already deals with the major problems (missing classes at home and important talks at the conference) as part of its structure.“ Stability of cues: “One effective technique for improving plan performance is to improve the proper activation of a plan rather than improve the plan itself. For example, placing an important paper that needs to be reviewed on his desk before going home improves the likelihood that an agent will see and read it the next day. Marking calendars and leaving notes serves the same sort of purpose.” As with other goals, there can be exceptions. In particular, adversarial situations may provide an incentive to increase the complexity. For example, Go players who are losing a game will sometimes intentionally complicate the game, in order to make the game harder for their opponent to figure out and in the hopes of getting lucky. And if the AI did think that it had a better ability to deal with the complexity than its competitors did, then it would have an incentive to maintain or even increase the complexity. Possible implications: AIs trying to take over the world may seek to simplify it, though it’s not clear exactly how they would do this – e.g. if the world was previously relatively well-understood, such as by tapping into existing human research, destructive actions can make it harder to understand by destroying existing dynamics and creating an environment that hasn’t been studied as much. Similarly, humans often “simplify” their environment by creating additional structure (layers of abstraction, insurance contracts, houses). In principle, the AI might also seek to complicate the world, in order to make it harder for its human adversaries to understand its plans. However, the world is already so complicated that it seems hard to make any take-over plan work reliably, even for a superintelligence. That said, the AI could still complicate any given subcomponent of its plans that might be thwarted by an opponent capable of predicting what the AI was doing. The need for this drive suggests the need for a high amount of trial-and-error for tasks of any nontrivial complexity; even something as simple as a 19×19 Go game requires many games and building up an intuition of it, for humans and AIs alike. Increasing the amount of complexity in the world might make it harder for AIs to take over… …but humans tend to build additional complexity that makes the world easier to understand and control, so it’s not clear whether the world moves in a direction that’s easier or harder to take over. These ideas might suggest the creation of “active honeypots”, defensive measures that considerably increase the complexity of modeling the human response and which the AI might need to eliminate in order to execute a takeover, thus tipping off the humans. This blog post was written as part of research funded by the Foundational Research Institute. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Friday, April 15th, 2016
11:48 am - AI risk model: single or multiple AIs?
 EDIT April 20th: Replaced original graph with a clearer one. My previous posts have basically been discussing a scenario where a single AI becomes powerful enough to threaten humanity. However, there is no reason to only focus on the scenario with a single AI. Depending on our assumptions, a number of AIs could also emerge at the same time. Here are some considerations. A single AI The classic AI risk scenario. Some research group achieves major headway in developing AI, and no others seem to be within reach. For an extended while, it is the success of failure of this AI group that matters. This would seem relatively unlikely to persist, given the current fierce competition in the AI scene. Whereas a single company could conceivably achieve a major lead in a rare niche with little competition, this seems unlikely to be the case for AI. A possible exception might be if a company managed to monopolize the domain entirely, or if it had development resources that few others did. For example, companies such as Google and Facebook are currently the only ones with access to large datasets used for machine learning. On the other hand, dependence on such huge datasets is a quirk of current machine learning techniques – an AGI would need the ability to learn from much smaller sets of data. A more plausible crucial asset might be something like supercomputing resources – possibly the first AGIs will need massive amounts of computing power. Bostrom (2016) discusses the impact of openness on AI development. Bostrom notes that if there is a large degree of openness, and everyone has access to the same algorithms, then hardware may become the primary limiting factor. If the hardware requirements for AI were relatively low, then high openness could lead to the creation of multiple AIs. On the other hand, if hardware was the primary limiting factor and large amounts of hardware were needed, then a few wealthy organizations might be able to monopolize AI for a while. Branwen (2015) has suggested that hardware production is reliant on a small number of centralized factories that would make easy targets for regulation. This would suggest a possible route by which AI might become amenable to government regulation, limiting the amount of AIs deployed. Similarly, there have been various proposals of government and international regulation of AI development. If successfully enacted, such regulation might limit the number of AIs that were deployed. Another possible crucial asset would be the possession of a non-obvious breakthrough insight, one which would be hard for other researchers to come up with. If this was kept secret, then a single company might plausibly develop major headway on others. [how often has something like this actually happened in a non-niche field?] The plausibility of the single-AI scenario is also affected by the length of a takeoff. If one presumes a takeoff speed that is only a few months, then a single AI scenario seems more likely. Successful AI containment procedures may also increase the chances of there being multiple AIs, as the first AIs remain contained, allowing for other projects to catch up. Multiple collaborating AIs A different scenario is one where a number of AIs exist, all pursuing shared goals. This seems most likely to come about if all the AIs are created by the same actor. This scenario is noteworthy because the AIs do not necessarily need to be superintelligent individually, but they may have a superhuman ability to coordinate and put the interest of the group above individual interests (if they even have anything that could be called an individual interest). This possibility raises the question – if multiple AIs collaborate and share information between each other, to such an extent that the same data can be processed by multiple AIs at a time, how does one distinguish between multiple collaborating AIs and one AI composed of many subunits? This is arguably not a distinction that would “cut reality at the joints”, and the difference may be more a question of degree. The distinction likely makes more sense if the AIs cannot completely share information between each other, such as because each of them has developed a unique conceptual network, and cannot directly integrate information from the others but has to process it in its own idiosyncratic way. Multiple AIs with differing goals A situation with multiple AIs that did not share the same goals could occur if several actors reached the capability for building AIs around the same time. Alternatively, a single organization might deploy multiple AIs intended to achieve different purposes, which might come into conflict if measures to enforce cooperativeness between them failed or were never deployed in the first place (maybe because of an assumption that they would have non-overlapping domains). One effect of having multiple groups developing AIs is that this scenario may remove the possibilities of stopping to pursue further safety measures before deploying the AI, or of deploying an AI with safeguards that reduce performance (Bostrom 2016). If the actor that deploys the most effective AI earliest on can dominate others who take more time, then the more safety-conscious actors may never have the time to deploy their AIs. Even if none of the AI projects chose to deploy their AIs carelessly, the more AI projects there are, the more likely it becomes that at least one of them will have their containment procedures fail. The possibility has been raised that having multiple AIs with conflicting goals would be a good thing, in that it would allow humanity to play the AIs against each other. This seems highly unobvious, for it is not clear why humans wouldn’t simply be caught in the crossfire. In a situation with superintelligent agents around, it seems more likely that humans would be the ones that would be played with. Bostrom (2016) also notes that unanticipated interactions between AIs already happen even with very simple systems, such as in the interactions that led to the Flash Crash, and that particularly AIs that reasoned in non-human ways could be very difficult for humans to anticipate once they started basing their behavior on what the other AIs did. A model with assumptions Here’s a new graphical model about an AI scenario, embodying a specific set of assumptions. This one tries to take a look at some of the factors that influence whether there might be a single or several AIs. This model both makes a great number of assumptions, AND leaves out many important ones! For example, although I discussed openness above, openness is not explicitly included in this model. By sharing this, I’m hoping to draw commentary on 1) which assumptions people feel are the most shaky and 2) which additional ones are valid and should be explicitly included. I’ll focus on those ones in future posts. Written explanations of the model: We may end up in a scenario where there is (for a while) only a single or a small number of AIs if at least one of the following is true: The breakthrough needed for creating AI is highly non-obvious, so that it takes a long time for competitors to figure it out AI requires a great amount of hardware and only a few of the relevant players can afford to run it There is effective regulation, only allowing some authorized groups to develop AI We may end up with effective regulation at least if: AI requires a great amount of hardware, and hardware is effectively regulated (this is not meant to be the only way by which effective regulation can occur, just the only one that was included in this flowchart) We may end up in a scenario where there are a large number of AIs if: There is a long takeoff and competition to build them (ie. ineffective regulation) If there are few AI, and the people building them take their time to invest in value alignment and/or are prepared to build AIs that are value-aligned even if that makes them less effective, then there may be a positive outcome. If people building AIs do not do these things, then AIs are not value aligned and there may be a negative outcome. If there are many AI and there are people who are ready to invest time/efficency to value-aligned AI, then those AIs may become outcompeted by AIs whose creators did not invest in those things, and there may be a negative outcome. Not displayed in the diagram because it would have looked messy: If there’s a very short takeoff, this can also lead to there only being a single AI, since the first AI to cross a critical threshold may achieve dominance over all the others. However, if there is fierce competition this still doesn’t necessarily leave time for safeguards and taking time to achieve safety – other teams may also be near the critical threshold. This blog post was written as part of research funded by the Foundational Research Institute. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Tuesday, April 5th, 2016
10:59 am - Disjunctive AI risk scenarios: AIs gaining the power to act autonomously
Monday, April 4th, 2016
12:59 pm - Disjunctive AI risk scenarios: AIs gaining a decisive advantage
Monday, February 8th, 2016
11:03 am - Reality is broken, or, an XCOM2 review
Wednesday, December 16th, 2015
10:10 am - Me and Star Wars
Saturday, November 28th, 2015
6:26 pm - Desiderata for a model of human values
 Soares (2015) defines the value learning problem as By what methods could an intelligent machine be constructed to reliably learn what to value and to act as its operators intended? There have been a few attempts to formalize this question. Dewey (2011) started from the notion of building an AI that maximized a given utility function, and then moved on to suggest that a value learner should exhibit uncertainty over utility functions and then take “the action with the highest expected value, calculated by a weighted average over the agent’s pool of possible utility functions.” This is a reasonable starting point, but a very general one: in particular, it gives us no criteria by which we or the AI could judge the correctness of a utility function which it is considering. To improve on Dewey’s definition, we would need to get a clearer idea of just what we mean by human values. In this post, I don’t yet want to offer any preliminary definition: rather, I’d like to ask what properties we’d like a definition of human values to have. Once we have a set of such criteria, we can use them as a guideline to evaluate various offered definitions. By “human values”, I here basically mean the values of any given individual: we are not talking about the values of, say, a whole culture, but rather just one person within that culture. While the problem of aggregating or combining the values of many different individuals is also an important one, we should probably start from the point where we can understand the values of just a single person, and then use that understanding to figure out what to do with conflicting values. In order to make the purpose of this exercise as clear as possible, let’s start with the most important desideratum, of which all the others are arguably special cases of: 1. Useful for AI safety engineering. Our model needs to be useful for the purpose of building AIs that are aligned with human interests, such as by making it possible for an AI to evaluate whether its model of human values is correct, and by allowing human engineers to evaluate whether a proposed AI design would be likely to further human values. In the context of AI safety engineering, the main model for human values that gets mentioned is that of utility functions. The one problem with utility functions that everyone always brings up, is that humans have been shown not to have consistent utility functions. This suggests two new desiderata: 2. Psychologically realistic. The proposed model should be compatible with that which we know about current human values, and not make predictions about human behavior which can be shown to be empirically false. 3. Testable. The proposed model should be specific enough to make clear predictions, which can then be tested. As additional requirements related to the above ones, we may wish to add: 4. Functional. The proposed model should be able to explain what the functional role of “values” is: how do they affect and drive our behavior? The model should be specific enough to allow us to construct computational simulations of agents with a similar value system, and see whether those agents behave as expected within some simulated environment. 5. Integrated with existing theories. The proposed definition model should, to as large an extent possible, fit together with existing knowledge from related fields such as moral psychology, evolutionary psychology, neuroscience, sociology, artificial intelligence, behavioral economics, and so on. However, I would argue that as a model of human value, utility functions also have other clear flaws. They do not clearly satisfy these desiderata: 6. Suited for modeling internal conflicts and higher-order desires. A drug addict may desire a drug, while also desiring that he not desire it. More generally, people may be genuinely conflicted between different values, endorsing contradictory sets of them given different situations or thought experiments, and they may struggle to behave in a way in which they would like to behave. The proposed model should be capable of modeling these conflicts, as well as the way that people resolve them. 7. Suited for modeling changing and evolving values. A utility function is implicitly static: once it has been defined, it does not change. In contrast, human values are constantly evolving. The proposed model should be able to incorporate this, as well as to predict how our values would change given some specific outcomes. Among other benefits, an AI whose model of human values had this property might be able to predict things that our future selves would regret doing (even if our current values approved of those things), and warn us about this possibility in advance. 8. Suited for generalizing from our existing values to new ones. Technological and social change often cause new dilemmas, for which our existing values may not provide a clear answer. As a historical example (Lessig 2004), American law traditionally held that a landowner did not only control his land but also everything above it, to “an indefinite extent, upwards”. Upon the invention of this airplane, this raised the question – could landowners forbid airplanes from flying over their land, or was the ownership of the land limited to some specific height, above which the landowners had no control? In answer to this question, the concept of landownership was redefined to only extend a limited, and not an indefinite, amount upwards. Intuitively, one might think that this decision was made because the redefined concept did not substantially weaken the position of landowners, while allowing for entirely new possibilities for travel. Our model of value should be capable of figuring out such compromises, rather than treating values such as landownership as black boxes, with no understanding of why people value them. As an example of using the current criteria, let’s try applying them to the only paper that I know of that has tried to propose a model of human values in an AI safety engineering context: Sezener (2015). This paper takes an inverse reinforcement learning approach, modeling a human as an agent that interacts with its environment in order to maximize a sum of rewards. It then proposes a value learning design where the value learner is an agent that uses Solomonoff’s universal prior in order to find the program generating the rewards, based on the human’s actions. Basically, a human’s values are equivalent to a human’s reward function. Let’s see to what extent this proposal meets our criteria. Useful for AI safety engineering. To the extent that the proposed model is correct, it would clearly be useful. Sezener provides an equation that could be used to obtain the probability of any given program being the true reward generating program. This could then be plugged directly into a value learning agent similar to the ones outlined in Dewey (2011), to estimate the probability of its models of human values being true. That said, the equation is incomputable, but it could be possible to construct computable approximations. Psychologically realistic. Sezener assumes the existence of a single, distinct reward process, and suggests that this is a “reasonable assumption from a neuroscientific point of view because all reward signals are generated by brain areas such as the striatum”. On the face of it, this seems like an oversimplification, particularly given evidence suggesting the existence of multiple valuation systems in the brain. On the other hand, since the reward process is allowed to be arbitrarily complex, it could be taken to represent just the final output of the combination of those valuation systems. Testable. The proposed model currently seems to be too general to be accurately tested. It would need to be made more specific. Functional. This is arguable, but I would claim that the model does not provide much of a functional account of values: they are hidden within the reward function, which is basically treated as a black box that takes in observations and outputs rewards. While a value learner implementing this model could develop various models of that reward function, and those models could include internal machinery that explained why the reward function output various rewards at different times, the model itself does not make any assumptions of this. Integrated with existing theories. Various existing theories could in principle used to flesh out the internals of the reward function, but currently no such integration is present. Suited for modeling internal conflicts and higher-order desires. No specific mention of this is made in the paper. The assumption of a single reward function that assigns a single reward for every possible observation seems to implicitly exclude the notion of internal conflicts, with the agent always just maximizing a total sum of rewards and being internally united in that goal. Suited for modeling changing and evolving values. As written, the model seems to consider the reward function as essentially unchanging: “our problem reduces to finding the most probable $p_R$ given the entire action-observation history $a_1o_1a_2o_2 . . . a_no_n$.” Suited for generalizing from our existing values to new ones. There does not seem to be any obvious possibility for this in the model. I should note that despite its shortcomings, Sezener’s model seems like a nice step forward: like I said, it’s the only proposal that I know of so far that has even tried to answer this question. I hope that my criteria would be useful in spurring the development of the model further. As it happens, I have a preliminary suggestion for a model of human values which I believe has the potential to fulfill all of the criteria that I have outlined. However, I am far from certain that I have managed to find all the necessary criteria. Thus, I would welcome feedback, particularly including proposed changes or additions to these criteria. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Thursday, November 12th, 2015
10:42 am - Learning from painful experiences
 A model that I’ve found very useful is that pain is an attention signal. If there’s a memory or thing that you find painful, that’s an indication that there’s something important in that memory that your mind is trying to draw your attention to. Once you properly internalize the lesson in question, the pain will go away. That’s a good principle, but often hard to apply in practice. In particular, several months ago there was a social situation that I screwed up big time, and which was quite painful to think of afterwards. And I couldn’t figure out just what the useful lesson was there. Trying to focus on it just made me feel like a terrible person with no social skills, which didn’t seem particularly useful. Yesterday evening I again discussed it a bit with someone who’d been there, which helped relieve the pain a bit, enough that the memory wasn’t quite as aversive to look at. Which made it possible for me to imagine myself back in that situation and ask, what kinds of mental motions would have made it possible to salvage the situation? When I first saw the shocked expressions of the people in question, instead of locking up and reflexively withdrawing to an emotional shell, what kind of an algorithm might have allowed me to salvage the situation? Answer to that question: when you see people expressing shock in response to something that you’ve said or done, realize that they’re interpreting your actions way differently than you intended them. Starting from the assumption that they’re viewing your action as bad, quickly pivot to figuring out why they might feel that way. Explain what your actual intentions were and that you didn’t intend harm, apologize for any hurt you did cause, use your guess of why they’re reacting badly to acknowledge your mistake and own up to your failure to take that into account. If it turns out that your guess was incorrect, let them correct you and then repeat the previous step. That’s the answer in general terms, but I didn’t actually generate that answer by thinking in general terms. I generated it by imagining myself back in the situation, looking for the correct mental motions that might have helped out, and imagining myself carrying them out, saying the words, imagining their reaction. So that the next time that I’d be in a similar situation, it’d be associated with a memory of the correct procedure for salvaging it. Not just with a verbal knowledge of what to do in abstract terms, but with a procedural memory of actually doing it. That was a painful experience to simulate. But it helped. The memory hurts less now. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Saturday, October 31st, 2015
4:52 pm - Maverick Nannies and Danger Theses
Sunday, October 18th, 2015
1:01 pm - Changing language to change thoughts
 Three verbal hacks that sound almost trivial, but which I’ve found to have a considerable impact on my thought: 1. Replace the word ‘should’ with either ‘I want’, or a good consequence of doing the thing. Examples: “I should answer that e-mail soon.” -> “If I answered that e-mail, it would make the other person happy and free me from having to stress it.” “I should have left that party sooner.” -> “If I had left that party before midnight, I’d feel more rested now.” “I should work on my story more at some point.” -> “I want to work on my story more at some point.” Motivation: the more we think in terms of external obligations, the more we feel a lack of our own agency. Each thing that we “should” do is actually either something that we’d want to do because it would have some good consequences (avoiding bad consequences also counts as a good consequence), something that we have a reason for wanting to do differently the next time around, or something that we don’t actually have a good reason to do but just act out of a general feeling of obligation. If we only say “I should”, we will not only fail to distinguish between these cases, we will also be less motivated to do the things in cases where there is actually a good reason. The good reason will be less prominent in our thoughts, or possibly even entirely hidden behind the “should”. If you do try to rephrase “I should” as “I want”, you may either realize that you really do want it (instead of just being obligated to do it), or that you actually don’t want it and can’t come up with any good reason for doing it, in which case you might as well drop it. Special note: there are some legitimate uses for “should”. In particular, it is the socially accepted way of acknowledging the other person when they give us an unhelpful suggestion. “You should get some more exercise.” “Yeah I should.” (Translation: of course I know that, it’s not like you’re giving me any new information and repeating things that I know isn’t going to magically change my behavior. But I figure that you’re just trying to be helpful, so let me acknowledge that and then we can talk about something else.) However, I suspect that because we’re used to treating “I should” as a reason to acknowledge the other person without needing to take actual action, the word also becomes more poisonous to motivation when we use it in self-talk, or when discussing matters with someone we want to actually be honest with. “Should” also tends to get used for guilt-tripping, so expressions like “I should have left that party sooner” might make us feel bad rather than focusing on our attention on the benefits of having left earlier. The next time we’re at a party, the former phrasing incentivizes us to come up with excuses for why it’s okay to stay this time around. The latter encourages us to actually consider the benefits and costs of the leaving earlier versus staying, and then choosing the option that’s the most appropriate. 2. Replace expressions like “I’m bad at X” with “I’m currently bad at X” or “I’m not yet good at X”. Examples: “I can’t draw.” -> “I can’t draw yet.” “I’m not a people person.” -> “I’m currently not a people person.” “I’m afraid of doing anything like that.” -> “So far I’m afraid of doing anything like that.” Motivation: the rephrased expression draws attention to the possibility that we could become better, and naturally leads us to think about ways in which we could improve ourselves. It again emphasizes our own agency and the fact that for a lot of things, being good or bad at them is just a question of practice. Even better, if you can trace the reason of your bad-ness, is to 3. Eliminate vague labels entirely and instead talk about specific missing subskills, or weaknesses that you currently have. Examples: “I can’t draw.” -> “Right now I don’t know how to move beyond stick figures.” “I’m not a people person.” -> “I currently lock up if I try to have a conversation with someone.” Motivation: figuring out the specific problem makes it easier to figure out what we would need to do if we wanted to address it, and might gives us a self-image that’s both kinder and both realistic, in making the lack of skill a specific fixable problem rather than a personal flaw. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Friday, October 9th, 2015
5:36 pm - Rational approaches to emotions
Friday, October 2nd, 2015
9:03 am - Two conversationalist tips for introverts
 Two of the biggest mistakes that I used to make that made me a poor conversationalist: 1. Thinking too much about what I was going to say next. If another person is speaking, don’t think about anything else, where “anything else” includes your next words. Instead, just focus on what they’re saying, and the next thing to say will come to mind naturally. If it doesn’t, a brief silence before you say something is not the end of the world. Let your mind wander until it comes up with something. 2. Asking myself questions like “is X interesting / relevant / intelligent-sounding enough to say here”, and trying to figure out whether the thing on my mind was relevant to the purpose of the conversation. Some conversations have an explicit purpose, but most don’t. They’re just the participants saying whatever random thing comes to their mind as a result of what the other person last said. Obviously you’ll want to put a bit of effort to screening off any potentially offensive or inappropriate comments, but for the most part you’re better off just saying whatever random thing comes to your mind. Relatedly, I suspect that these kinds of tendencies are what make introverts experience social fatigue. Social fatigue seems [in some people’s anecdotal experience; don’t have any studies to back me up here] to be associated with mental inhibition: the more you have to spend mental resources on holding yourself back, the more exhausted you will be afterwards. My experience suggests that if you can reduce the amount of filters on what you say, then this reduces mental inhibition, and correspondingly reduces the extent to which socializing causes you fatigue. Peter McCluskey reports of a similar experience; other people mention varying degrees of agreement or disagreement. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Tuesday, August 18th, 2015
2:40 pm - Change blindness
 Antidepressants are awesome. (At least they were for me.) It’s now been about a year since I started on SSRIs. Since my prescription is about to run out, I scheduled a meeting with a psychiatrist to discuss whether to stay on them. Since my health care provider has changed, I went to my previous one and got a copy of my patient records to bring to the new one. And wow. It’s kinda shocking to read them: my previous psychiatrist has written down things like: “Patient reports moments of despair and anguish of whether anything is going to lead to anything useful, and is worried for how long this will last. Recently there have been good days as well, but isn’t sure whether those will keep up.” And the psychologist I spoke with has written down: “At times has very negative views of the future, afraid that will never reach his goals.” And the thing is, reading that, I remember saying those things. I remember having those feelings of despair, of nothing ever working out. But I only remember them now, when I read through the records. I had mostly forgotten that I even did have those feelings. When I dig my memory, I can find other such things. A friend commenting to me that, based on her observations, I seem to be roughly functional maybe about half the time. Me posting on social media that I have a constant anxiety, a need to escape, being unable to really even enjoy any free time I have. A feeling that taking even a major risk for the sake of feeling better would be okay, because I didn’t really have all that much to lose. Having regular Skype sessions with another friend, and feeling bad because he seemed to be getting a lot of things done, and my days just seemed to pass by without me managing to make much progress on anything. All of that had developed so gradually and over the years that it had never really even occurred to me that it wasn’t normal. And then, after I got the antidepressants, those helped me get back on my feet, and then things gradually improved until I no longer even remembered the depths of what I had thought was normal, a year back. Change blindness. It’s a thing. For a less anecdotal summary on the effects of SSRIs, see Scott Alexander’s SSRIs: Much More Than You Wanted to Know for a comprehensive look at the current studies. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Tuesday, July 7th, 2015
4:26 pm - DeepDream: Today psychedelic images, tomorrow unemployed artists
 One interesting thing that I noticed about Google’s DeepDream algorithm (which you might also know as “that thing making all pictures look like psychedelic trips“) is that it seems to increase the image quality. For instance, my current Facebook profile picture was ran through DD and looks sharper than the original, which was relatively fuzzy and grainy. Me, before and after drugs. If you know how DD works, this is not too surprising in retrospect. The algorithm, similar to the human visual system, works by first learning to recognize simple geometric shapes, such as (possibly curvy) lines. Then it learns higher-level features combining those lower-level features, like learning that you can get an eyeball by combining lines in a certain way. The DD algorithm looks for either low- or high-level features and strengthens them. Lines in a low-quality image are noisy versions of lines in a high-quality image. The DD algorithm has learned to “know” what lines “should” look like, so if you run it on the low-level setting, it takes anything possible that could be interpreted as a high-quality (possibly curvy) line and makes it one. Of course, what makes this fun is that it’s overly aggressive and also adds curvy lines that shouldn’t actually be there, but it wouldn’t necessarily need to do that. Probably with the right tweaking, you could make it into a general purpose image quality enhancer. A very good one, since it wouldn’t be limited to just using the information that was actually in the image. Suppose you gave an artist a grainy image of a church, and asked them to draw something using that grainy picture as a reference. They could use that to draw a very detailed and high-quality picture of a church, because they would have seen enough churches to imagine what the building in the grainy image should look like in real life. A neural net trained on a sufficiently large dataset of images would effectively be doing the same. Suddenly, even if you were using a cheap and low-quality camera to take your photos, you could make them all look like high-quality ones. Of course, the neural net might be forced to invent some details, so your processed photos might differ somewhat from actual high-quality photos, but it would often be good enough. But why stop there? We’ve already established that the net could use its prior knowledge of the world to fill in details that aren’t necessarily in the original picture. After all, it’s doing that with all the psychedelic pictures. The next version would be a network that could turn sketches into full-blown artwork. Just imagine it. Maybe you’re making a game, and need lots of art for it, but can’t afford to actually pay an artist. So you take a neural net, feed to it a large dataset of the kind of art you want. Then you start making sketches that aren’t very good, but are at least recognizable as elven rangers or something. You give that to the neural net and have it fill in the details and correct your mistakes, and there you go! If NN-generated art would always have distinctive recognizable style, it’d probably quickly become seen as cheap and low status, especially if it wasn’t good at filling in the details. But it might not acquire that signature style, depending on how large of a dataset was actually needed for training it. Currently deep learning approaches tend to require very large datasets, but as time goes on, possibly you could do with less. And then you could get an infinite amount of different art styles, simply by combining any number of artists or art styles to get a new training set, feeding that to a network, and getting a blend of their styles to use. Possibly people might get paid doing nothing but just looking for good combinations of styles, and then selling the trained networks. Using neural nets to generate art would be limited to simple 2D images at first, but you could imagine it getting to the point of full-blown 3D models and CGI eventually. And yes, this is obviously going to be used for porn as well. Here’s a bit of a creepy thing: nobody will need to hack the iCloud accounts of celebrities in order to get naked pictures of them anymore. Just take the picture of any clothed person, and feed it to the right network, and it’ll probably be capable of showing you what that picture would look like if the person was naked. Or associated with one of any number of kinks and fetishes. It’s interesting that for all the talk about robots stealing our jobs, we were always assuming that the creative class would basically be safe. Not necessarily so. How far are we from that? Hard to tell, but I would expect at least the image quality enhancement versions to pop up very soon. Neural nets can already be trained on text corpuses and generate lots of novel text that almost kind of makes sense. Magic cards, too. I would naively guess image enhancement to be an easier problem than actually generating sensible text (which is something that seems AI-complete). And we just got an algorithm that can take two images of a scene and synthesize a third image from a different point of view, to name just the latest fun image-related result from my news feed. But then I’m not an expert on predicting AI progress (few if any people are), so we’ll see. EDITED TO ADD: On August 28th, less than two months after the publication of this article, the news broke of an algorithm that could learn to copy the style of an artist. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Saturday, June 6th, 2015
2:06 pm - Learning to recognize judgmental labels
 In the spirit of Non-Violent Communication, I’ve today tried to pay more attention to my thoughts and notice any judgments or labels that I apply to other people that are actually disguised indications of my own needs. The first one that I noticed was this: within a few weeks I’ll be a visiting instructor at a science camp, teaching things to a bunch of teens and preteens. I was thinking of how I’d start my lessons, pondered how to grab their attention, and then noticed myself having the thought, “these are smart kids, I’m sure they’ll give me a chance rather than be totally unruly from the start”. Two judgements right there: “smart” and “unruly”. Stopped for a moment’s reflection. I’m going to the camp because I want the kids to learn things that I feel will be useful for them, yes, but at the same time I also have a need to feel respected and appreciated. And I feel uncertain of my ability to get that respect from someone who isn’t already inclined to view me in a favorable light. So in order to protect myself, I’m labelling kids as “smart” if they’re willing to give me a chance, implying that if I can’t get through to some particular one, then it was really their fault rather than mine. Even though they might be uninterested in what I have to say for reasons that have nothing to do with smarts, like me just making a boring presentation. Ouch. Okay, let me reword that original thought in non-judgemental terms: “these are kids who are voluntarily coming to a science camp and who I’ve been told are interested in learning, I’m sure they’ll be willing to listen at least to a bit of what I have to say”. There. Better. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Friday, May 29th, 2015
8:27 am - Adult children make mistakes, too
 There’s a lot of blame and guilt in many people’s lives. We often think of people in terms of good or bad, and feel unworthy or miserable if we fail at things we think we should be able to do. When we don’t do quite as well as we could, because we’re tired or unwell or distracted, we blame and belittle ourselves. Let’s take a different approach. Think of a young child, maybe three years old. He has come a long way from a newborn, but he’s still not that far along. If he tries his hand at making a drawing, and it’s not quite up to adult standards, we don’t think of him as being any worse for that. Or if he doesn’t quite want to share his toys or gets frustrated with his sibling, we understand that it’s because he’s still young, and hasn’t yet learned all the people skills. We don’t judge him for that, but just gently teach him what we’d like him to do instead. It’s not that he’s good or bad, it’s just that he lacks the skills and practice. At the same time, we see the vast potential in him, all the way that he has already come and the way he’s learning new things every day. Now, look at yourself from the perspective of some immensely wise, benevolent being. If you’re religious, that being could be God. If you have a transhumanist bent, maybe a superintelligent AI with understanding beyond human comprehension. Or you could imagine a vastly older version of you, one that had lived for thousands of years and seen and done things you couldn’t even imagine. From the perspective of such a being, aren’t you – and all those around you – the equivalent of that three-year-old? Someone who’s inevitably going to make mistakes and be imperfect, because the world is such a complicated place and nobody could have mastered it all? But who’s nevertheless come a long way from what they once were, and are only going to continue growing? Nate Soares has said that he feels more empathy towards people when he thinks of them as “monkeys who struggle to convince themselves that they’re comfortable in a strange civilization, so different from the ancestral savanna where their minds were forged”. Similarly, we could think of ourselves as young children outside their homes, in a world that’s much too complicated and vast for us to ever understand more than a small fraction of it, still making a valiant effort to do our best despite often being tired or afraid. Let’s take this attitude, not just towards others, but ourselves as well. We’re doing our best to learn to do the right things in a big, difficult world. If we don’t always succeed, there’s no blame: just a knowledge that we can learn to do better, if we make the effort. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Friday, May 8th, 2015
1:35 pm - Harry Potter and the Methods of Latent Dirichlet Allocation