No account? Create an account
 A view to the gallery of my mind > recent entries > calendar > friends > Website > profile > previous 20 entries
Tuesday, November 20th, 2018
9:06 pm - Incorrect hypotheses point to correct observations
9:06 pm - Incorrect hypotheses point to correct observations
9:06 pm - Incorrect hypotheses point to correct observations
9:06 pm - Incorrect hypotheses point to correct observations
Tuesday, October 23rd, 2018
1:20 pm - Mark Eichenlaub: How to develop scientific intuition
 Recently on the CFAR alumni mailing list, someone asked a question about how to develop scientific intuition. In response, Mark Eichenlaub posted an excellent and extensive answer, which was so good that I asked for permission to repost it in public. He graciously gave permission, so I’ve reproduced his message below. (He otherwise retains the rights to this, meaning that the standard CC license on my blog doesn’t apply to this post.) From: Mark Eichenlaub Date: Tue, Oct 23, 2018 at 9:34 AM Subject: Re: [CFAR Alumni] Suggestions for developing scientific intuition Sorry for the length, I recently finished a PhD on this topic. (After I wrote the answer kerspoon linked, I went to grad school to study the topic.) This is specifically about solving physics problems but hopefully speaks to intuition a bit more broadly in places. I mostly think of intuition as the ability to quickly coordinate a large number of small heuristics. We know lots of small facts and patterns, and intuition is about matching the relevant ones onto the current situation. The little heuristics are often pretty local and small in scope. For example, the other day I heard this physics problem: You set up a trough with water in it. You hang just barely less than half of the trough off the edge of a table, so that it balances, but even a small force at the far end would make it tip over. You put a boat in the trough at the end over the table. The trough remains balanced. Then you slowly push the boat down to the other end of the trough, so that’s it’s in the part of the trough that hangs out from the table. What happens? (I.E. does the trough tip over?) The answer is (rot13) Gur gebhtu qbrf abg gvc; vg erznvaf onynaprq (nf ybat nf gur zbirzrag bs gur obng vf fhssvpvragyl fybj fb gung rirelguvat erznvaf va rdhvyvoevhz). I knew this “intuitively”, by which I mean I got it within a second or so of understanding the question, and without putting in conscious effort to thinking about it. (I wasn’t certain I was right until I had consciously thought it out, but I was reasonably confident within a second, and my intuition bore out.) I don’t think this was due to some sort of general intuition about problem solving, science, physics, mechanics, or even floating. It felt like I could solve the problem intuitively specifically because I had seen sufficiently-similar things that led me to the specific heuristic “a floating object spreads its weight out evenly over the bottom of the container it’s floating in.” Then I think of “having intuition” in physics as having maybe a thousand little rules like that and knowing when to call on which one. For this particular heuristic, there is a classic problem asking what happens to the water level in a lake if you are in a boat with a rock, and you throw the rock into the water and it sinks to the bottom. One solution to that problem is that when the rock is on the bottom of the lake, it exerts more force on that part of the bottom of the lake than is exerted at other places. By contrast, when the rock is still in the boat, the only thing touching the bottom of the lake is water, and the water pressure is the same everywhere, so the weight of the rock is distributed evenly across the entire lake. The total force on the bottom of the lake doesn’t change between the two scenarios (because gravity pulls on everything just as hard either way), when the rock is sitting on the bottom of the lake and the force on the bottom of the lake is higher under the rock, it must be lower everywhere else to compensate. The pressure everywhere else is $\rho g h$, so if that goes down, the level of the lake goes down. Conclusion: when you throw the rock overboard, the level of the lake goes down a bit. When I thought about that problem, I presumably built the “weight distributed evenly” heuristic. All I had to do was quickly apply it to the trough problem to solve that one as well. And if someone else also had a background in physics but didn’t find the trough problem easy, it’s probably because they simply hadn’t happened to think about the boat problem, or some other similar problems, in the right way, and hadn’t come away with the heuristic about the weight of floating things being spread out evenly. To me, this picture of intuition as small heuristics doesn’t look good for the idea of developing powerful intuition. The “weight gets spread out by floating” heuristic is not likely to transfer to much else. I’ve used it for two physics problems about floating things and, as far as I know, nothing else. You can probably think of lots of similar heuristics. For example, “conservation of expected evidence“. You might catch a mistake in someone’s reasoning, or an error in a long probability calculation you made, if you happen to notice that the argument or calculation violates conservation of expected evidence. The nice thing about this is that it can happen almost automatically. You don’t have to stop after every calculation or argument and think, “does this break conservation of expected evidence?”. Instead, you wind up learning some sorts of triggers that you associate with the principle that prime it in your mind, and then, if it becomes relevant to the argument, you notice that and cite the principle. In this picture, building intuition is about learning a large number of these heuristics, along with their triggers. However, while the individual small heuristics are often the easiest things to point to in an intuitive solution to a problem, I do think there are more general, and therefore more transferrable parts of intuition as well. I imagine that the paragraph I wrote explaining the solution to the boat problem will be largely incomprehensible to someone who hasn’t studied physics. That’s partially because it uses concepts they won’t have a rigorous understanding of (e.g. pressure), that it tacitly uses small heuristics it didn’t explain (e.g. that the reason the pressure is the same along the bottom of the lake is that if it weren’t, there would be horizontal forces that push the water around until the pressure did equalize in this way), partially that it made simplifications that it didn’t state and it might not be clear are justified (e.g. that the bottom of the lake is flat). More importantly, it relies on a general framework of Newtonian mechanics. For example, there are a number of tacit applications of Newton’s laws in the argument. For example, I stated that the total force on the bottom of the lake is the same whether the rock is resting on the bottom or floating in the boat “because gravity pulls on everything just as hard either way”, but these aren’t directly connected concepts. Gravity pulls the system (boat + water + rock) down just as hard no matter where the rock is. That system is not accelerating, so by Newton’s second law, the bottom of the lake pushes up on that system just as hard in each scenario. And by Newton’s third law, the system pushes down on the bottom of the lake just as hard in each scenario. So understanding the argument involves some fairly general heuristics such as “apply Newton’s second law to an object in equilibrium to show that two forces on it have equal magnitude” – a heuristic I’ve used hundreds of times, and “decide what objects to define as part of a system fluidly as you go through a problem” (in this case, switching from thinking about the rock as a system to thinking about rock+boat+water as a single system) – a skill I’ve used hundreds to thousands of times across all of physics. (My job is to teach high schoolers to be really good at solving problems like this, so I spend way more time on it than most people, so applying a heuristic specific to solving introductory physics problems in a thousand independent instances is realistic for me.) Then there may be more meta-level skills and heuristics that you develop in solving problems. These could be things like valuing non-calculation solutions, or believing that persevering on a tough problem is worthwhile. It’s also important that intuition isn’t just about having lots of little heuristics. It’s about organizing them and calling the right one up at the right time. You’ll have to ask yourself the right sorts of questions to prompt yourself to find the right heuristics, and that’s probably a pretty general skill. There is a fair amount of research on trying to understand what all these little heuristics are and how to develop them, but I’m mostly familiar with the research in physics. In the Quora answer kerspoon linked, I cited George Lakoff, and I still that he’s a good source for understanding how we go about taking primitive sorts of concepts (e.g. “up” and “down”) and using and adapting them, via partial metaphor, to understanding more abstract things. For a specific example that’s well-argued, see: Wittmann, Michael C., and Katrina E. Black. “Mathematical actions as procedural resources: An example from the separation of variables.” Physical Review Special Topics-Physics Education Research 11.2 (2015): 020114. They argue that students understand the arithmetic action “separation of variables” via analogy to their physical understanding of taking things and physically moving them around. However, I think Wittman and Black’s work is incomplete. For example, it doesn’t explain why students using the motion analogy for separation of variables do it correctly – they could just as well use motion to encode algebraically-invalid rules. Also, they don’t explain how the analogy develops. They just catalog that it exists. A foundational work in trying to understand the components of physical intuition is: DiSessa, Andrea A. “Toward an epistemology of physics.” Cognition and instruction 10.2-3 (1993): 105-225. This work establishes “phenomenological primitives”; little core heuristics such as “near is more”, which are templates for physical reasoning. Drawing from these templates, we might conclude that the nearer you are to a speaker, the louder the sound, or that the nearer you are to the sun, the hotter it will be (and therefore that summer is hot because the Earth is nearer the sun – a false but common and reasonable belief). That’s a long and somewhat-obscure paper. I really like his student’s work Sherin, Bruce L. “How students understand physics equations.” Cognition and instruction 19.4 (2001): 479-541. Like Disessa, Sherin builds his own framework for what intuition is. His scope is more limited though, focusing solely on building and interpreting certain types of equations in a manner that combines “intuitive” physical ideas and mathematical templates. He spells this out in detail more in the paper, and it’s incredibly clear and well-argued. Probably my favorite paper in the field. A more general reference that’s much more accessible than Disessa and more general an overview of cognition in physics than Sherin is “How Should We Think About How Our Students Think” by my advisor, Joe Redish http://media.physics.harvard.edu/video/?id=COLLOQ_REDISH_093013 (video) https://arxiv.org/abs/1308.3911 (paper). The actual process of building new heuristics is also studied, but over all I don’t think we know all that much. See my friend Ben’s paper Dreyfus, Benjamin W., Ayush Gupta, and Edward F. Redish. “Applying conceptual blending to model coordinated use of multiple ontological metaphors.” International Journal of Science Education 37.5-6 (2015): 812-838. for an example of theory-building around how we create new intuitions. He calls on a framework from cognitive science called “conceptual blending” that is rather formal, but I think pretty entertaining to read. A relevant search terms in the education literature: “conceptual change” but I find a lot of this literature to be hard-to-follow and not always a productive use of time to read. On the applied side, I think the state of the art in evidence-backed approaches to building intuition, at least in physics, is modeling instruction. I’m not sure what the best introduction to modeling instruction is. They have a website that seems okay. Eric Brewe writes on it and he’s usually very good. The basic idea is to have students collaboratively participate in the building of the theories of physics they’re using (in a specific way, with guidance and direction from a trained instructor), which gets them to think about the “whys” involved with a particular theory or model in a way they usually wouldn’t. I have written some about why I think things like checking the extreme cases of a formula are powerful intuition-building tools. A preprint is available here: https://arxiv.org/pdf/1804.01639.pdf However, I think it’s dangerous to have rules like “always check the dimensions of your answer”, “always check the extreme cases of a formula”, or even “always check that the numbers come out reasonable.” The reason is that having these things as procedures tends to encourage students to follow them by rote. A large part of the cognitive work involved isn’t in checking the extreme cases or the dimensions, but in realizing that in this particular situation, that would be a good thing to do. If you’re doing it only because an external prompt is telling you to, you aren’t building the appropriate meta-level habits. See https://www.tandfonline.com/doi/abs/10.1080/09500693.2017.1308037 for an example of this effect. See papers on “metarepresentation” by Disessa and/or Sherin for another example of generalizable skills related to intuition and problem solving. Unfortunately, I don’t think writing books well or writing courses of individual study is something we know much about. I don’t know anyone who has a significant grant for that; the most I’ve ever seen on it is a poster here or there at a conference. Generally, grants are awarded for improving high school and college courses, or for professional development programs, supporting department or institution level changes at schools, etc. So adults who just want to learn on their own are not really served much by the research on the area. If you’re an adult who wants to self-study theoretical physics with an eye towards intuition, I recommend Leonard Susskind’s series of courses “The Theoretical Minimum” (the first three courses exist as books, the rest only as video lectures). He approaches mathematical topics with what I find an intuitive approach in most cases. Of course the Feynman lectures on physics are also very good. I’ll be building an introduction to physics course at Art of Problem Solving, starting work sometime this winter. It might be available in the spring, although students will mostly be middle and high school students (but anyone is welcome to take our courses). I currently teach an advanced physics problem-solving course at AoPS called “PhysicsWOOT”. I try to support intuition-building practices there, but the main aim is in training these many small heuristics which students need to solve contest problems. There should be something like modeling instruction for adult independent learners, but I don’t know of it. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Tuesday, October 9th, 2018
7:40 am - On insecurity as a friend
 There’s a common narrative about confidence that says that confidence is good, insecurity is bad. It’s better to develop your confidence than to be insecure. There’s an obvious truth to this. But what that narrative does not acknowledge, and what both a person struggling with insecurity and their well-meaning friends might miss, is that that insecurity may be in place for a reason. You might not notice it online, but I’ve usually been pretty timid and insecure in real life. But this wasn’t always the case. There were occasions earlier in my life when I was less insecure, more confident in myself. I was also pretty horrible at things like reading social nuance and figuring out when and why someone might be offended. So I was given, repeatedly, the feedback that my behavior was bad and inappropriate. Eventually a part of me internalized that as “I’m very likely to accidentally offend the people around me, so I should be very cautious about what I say, ideally saying nothing at all”. This was, I think, the correct lesson to internalize at that point! It shifted me more into an observer mode, allowing me to just watch social situations and learn more about their dynamics that way. I still don’t think that I’m great at reading social nuance, but I’m at least better at it than I used to be. And there have been times since then when I’ve decided that I should act with more confidence, and just get rid of the part that generates the insecurity. I’ve been about to do something, felt a sense of insecurity, and walked over the feeling and done the thing anyway. Sometimes this has had good results. But often it has also led to things blowing up in my face, with me inadvertently hurting someone and leaving me feeling guilty for months afterwards. Turns out, that feeling of insecurity wasn’t a purely bad thing. It was throwing up important alarms which I chose to ignore, alarms which were sounding because it recognized my behavior as matching previous behavior which had had poor consequences. Yes, on many occasions that part of me makes me way too cautious. And it would be good to moderate that caution a little. But the same part which generates the feelings of insecurity is the same part which is constantly working to model other people and their experience, their reactions to me. The part that is doing its hardest to make other people feel safe and comfortable around me, to avoid doing things that would make them feel needlessly hurt or upset or unsafe, and to actively let them know that I’m doing this. Just carving out that part would be a mistake. A moral wrong, even. The answer is not to get rid of it. The answer is to integrate its cautions better, to keep it with me as a trusted friend and ally – one which feels safe enough about getting its warnings listened to, that it will not scream all the time just to be heard. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Sunday, August 12th, 2018
9:04 am - New paper: Long-Term Trajectories of Human Civilization
 Long-Term Trajectories of Human Civilization (free PDF). Foresight, forthcoming, DOI 10.1108/FS-04-2018-0037. Authors: Seth D. Baum, Stuart Armstrong, Timoteus Ekenstedt, Olle Häggström, Robin Hanson, Karin Kuhlemann, Matthijs M. Maas, James D. Miller, Markus Salmela, Anders Sandberg, Kaj Sotala, Phil Torres, Alexey Turchin, and Roman V. Yampolskiy. Abstract Purpose: This paper formalizes long-term trajectories of human civilization as a scientific and ethical field of study. The long-term trajectory of human civilization can be defined as the path that human civilization takes during the entire future time period in which human civilization could continue to exist. Approach: We focus on four types of trajectories: status quo trajectories, in which human civilization persists in a state broadly similar to its current state into the distant future; catastrophe trajectories, in which one or more events cause significant harm to human civilization; technological transformation trajectories, in which radical technological breakthroughs put human civilization on a fundamentally different course; and astronomical trajectories, in which human civilization expands beyond its home planet and into the accessible portions of the cosmos. Findings: Status quo trajectories appear unlikely to persist into the distant future, especially in light of long-term astronomical processes. Several catastrophe, technological transformation, and astronomical trajectories appear possible. Value: Some current actions may be able to affect the long-term trajectory. Whether these actions should be pursued depends on a mix of empirical and ethical factors. For some ethical frameworks, these actions may be especially important to pursue. An excerpt from the press release over at the Global Catastrophic Risk Institute: Society today needs greater attention to the long-term fate of human civilization. Important present-day decisions can affect what happens millions, billions, or trillions of years into the future. The long-term effects may be the most important factor for present-day decisions and must be taken into account. An international group of 14 scholars calls for the dedicated study of “long-term trajectories of human civilization” in order to understand long-term outcomes and inform decision-making. This new approach is presented in the academic journal Foresight, where the scholars have made an initial evaluation of potential long-term trajectories and their present-day societal importance. “Human civilization could end up going in radically different directions, for better or for worse. What we do today could affect the outcome. It is vital that we understand possible long-term trajectories and set policy accordingly. The stakes are quite literally astronomical,” says lead author Dr. Seth Baum, Executive Director of the Global Catastrophic Risk Institute, a non-profit think tank in the US. The group of scholars including Olle Häggström, Robin Hanson, Karin Kuhlemann, Anders Sandberg, and Roman Yampolskiy have identified four types of long-term trajectories: status quo trajectories, in which civilization stays about the same, catastrophe trajectories, in which civilization collapses, technological transformation trajectories, in which radical technology fundamentally changes civilization, and astronomical trajectories, in which civilization expands beyond our home planet. Available here: https://kajsotala.fi/assets/2018/08/trajectories.pdf Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Friday, August 3rd, 2018
3:00 pm - Finland Museum Tour 1/??: Tampere Art Museum
Sunday, March 18th, 2018
10:15 am - Is the Star Trek Federation really incapable of building AI?
 In the Star Trek universe, we are told that it’s really hard to make genuine artificial intelligence, and that Data is so special because he’s a rare example of someone having managed to create one. But this doesn’t seem to be the best hypothesis for explaining the evidence that we’ve actually seen. Consider: In the TOS episode “The Ultimate Computer“, the Federation has managed to build a computer intelligent enough to run the Enterprise by its own, but it goes crazy and Kirk has to talk it into self-destructing. In TNG, we find out that before Data, Doctor Noonian Soong had built Lore, an android with sophisticated emotional processing. However, Lore became essentially evil and had no problems killing people for his own benefit. Data worked better, but in order to get his behavior right, Soong had to initially build him with no emotions at all. (TNG: “Datalore“, “Brothers“) In the TNG episode “Evolution“, Wesley is doing a science project with nanotechnology, accidentally enabling the nanites to become a collective intelligence which almost takes over the ship before the crew manages to negotiate a peaceful solution with them. The holodeck seems entirely capable of running generally intelligent characters, though their behavior is usually restricted to specific roles. However, on occasion they have started straying outside their normal parameters, to the point of attempting to take over the ship. (TNG: “Elementary, Dear Data“) It is also suggested that the computer is capable of running an indefinitely long simulation which is good enough to make an intelligent being believe in it being the real universe. (TNG: “Ship in a Bottle“) The ship’s computer in most of the series seems like it’s potentially quite intelligent, but most of the intelligence isn’t used for anything else than running holographic characters. In the TNG episode “Booby Trap“, a potential way of saving the Enterprise from the Disaster Of The Week would involve turning over control of the ship to the computer: however, the characters are inexplicably super-reluctant to do this. In Voyager, the Emergency Medical Hologram clearly has general intelligence: however, it is only supposed to be used in emergency situations rather than running long-term, its memory starting to degrade after a sufficiently long time of continuous use. The recommended solution is to reset it, removing all of the accumulated memories since its first activation. (VOY: “The Swarm“) There seems to be a pattern here: if an AI is built to carry out a relatively restricted role, then things work fine. However, once it is given broad autonomy and it gets to do open-ended learning, there’s a very high chance that it gets out of control. The Federation witnessed this for the first time with the Ultimate Computer. Since then, they have been ensuring that all of their AI systems are restricted to narrow tasks or that they’ll only run for a short time in an emergency, to avoid things getting out of hand. Of course, this doesn’t change the fact that your AI having more intelligence is generally useful, so e.g. starship computers are equipped with powerful general intelligence capabilities, which sometimes do get out of hand. Dr. Soong’s achievement with Data was not in building a general intelligence, but in building a general intelligence which didn’t go crazy. (And before Data, he failed at that task once, with Lore.) The Federation’s issue with AI is not that they haven’t solved artificial general intelligence. The Federation’s issue is that they haven’t reliably solved the AI alignment problem. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Monday, February 12th, 2018
11:33 am - Some conceptual highlights from “Disjunctive Scenarios of Catastrophic AI Risk”

My forthcoming paper, “Disjunctive Scenarios of Catastrophic AI Risk”, attempts to introduce a number of considerations to the analysis of potential risks from Artificial General Intelligence (AGI). As the paper is long and occasionally makes for somewhat dry reading, I thought that I would briefly highlight a few of the key points raised in the paper.

The main idea here is that most of the discussion about risks of AGI has been framed in terms of a scenario that goes something along the lines of “a research group develops AGI, that AGI develops to become superintelligent, escapes from its creators, and takes over the world”. While that is one scenario that could happen, focusing too much on any single scenario makes us more likely to miss out alternative scenarios. It also makes the scenarios susceptible to criticism from people who (correctly!) point out that we are postulating very specific scenarios that have lots of burdensome details.

To address that, I discuss here a number of considerations that suggest disjunctive paths to catastrophic outcomes: paths that are of the form “A or B or C could happen, and any one of them happening could have bad consequences”.

Superintelligence versus Crucial Capabilities

Bostrom’s Superintelligence, as well as a number of other sources, basically make the following argument:

1. An AGI could become superintelligent
2. Superintelligence would enable the AGI to take over the world

This is an important argument to make and analyze, since superintelligence basically represents an extreme case: if an individual AGI may become as powerful as it gets, how do we prepare for that eventuality? As long as there is a plausible chance for such an extreme case to be realized, it must be taken into account.

However, it is probably a mistake to focus only on the case of superintelligence. Basically, the reason why we are interested in a superintelligence is that, by assumption, it has the cognitive capabilities necessary for a world takeover. But what about an AGI which also had the cognitive capabilities necessary for taking over the world, and only those?

Such an AGI might not count as a superintelligence in the traditional sense, since it would not be superhumanly capable in every domain. Yet, it would still be one that we should be concerned about. If we focus too much on just the superintelligence case, we might miss the emergence of a “dumb” AGI which nevertheless had the crucial capabilities necessary for a world takeover.

That raises the question of what might be such crucial capabilities. I don’t have a comprehensive answer; in my paper, I focus mostly on the kinds of capabilities that could be used to inflict major damage: social manipulation, cyberwarfare, biological warfare. Others no doubt exist.

A possibly useful framing for future investigations might be, “what level of capability would an AGI need to achieve in a crucial capability in order to be dangerous”, where the definition of “dangerous” is free to vary based on how serious of a risk we are concerned about. One complication here is that this is a highly contextual question – with a superintelligence we can assume that the AGI may get basically omnipotent, but such a simplifying assumption won’t help us here. For example, the level of offensive biowarfare capability that would pose a major risk, depends on the level of the world’s defensive biowarfare capabilities. Also, we know that it’s possible to inflict enormous damage to humanity even with just human-level intelligence: whoever is authorized to control the arsenal of a nuclear power could trigger World War III, no superhuman smarts needed.

Crucial capabilities are a disjunctive consideration because they show that superintelligence isn’t the only level of capability that would pose a major risk: and there many different combinations of various capabilities – including ones that we don’t even know about yet – that could pose the same level of danger as superintelligence.

Incidentally, this shows one reason why the common criticism of “superintelligence isn’t something that we need to worry about because intelligence isn’t unidimensional” is misfounded – the AGI doesn’t need to be superintelligent in every dimension of intelligence, just the ones we care about.

How would the AGI get free and powerful?

In the prototypical AGI risk scenario, we are assuming that the developers of the AGI want to keep it strictly under control, whereas the AGI itself has a motive to break free. This has led to various discussions about the feasibility of “oracle AI” or “AI confinement” – ways to restrict the AGI’s ability to act freely in the world, while still making use of it. This also means that the AGI might have a hard time acquiring the resources that it needs for a world takeover, since it either has to do so while it is under constant supervision by its creators, or while on the run from them.

However, there are also alternative scenarios where the AGI’s creators voluntarily let it free – or even place it in control of e.g. a major corporation, free to use that corporation’s resources as it desires! My chapter discusses several ways by which this could happen: i) economic benefit or competitive pressure, ii) criminal or terrorist reasons, iii) ethical or philosophical reasons, iv) confidence in the AI’s safety, as well as v) desperate circumstances such as being otherwise close to death. See the chapter for more details on each of these. Furthermore, the AGI could remain theoretically confined but be practically in control anyway – such as in a situation where it was officially only giving a corporation advice, but its advice had never been wrong before and nobody wanted to risk their jobs by going against the advice.

Would the Treacherous Turn involve a Decisive Strategic Advantage?

Looking at crucial capabilities in a more fine-grained manner also raises the question of when an AGI would start acting against humanity’s interests. In the typical superintelligence scenario, we assume that it will do so once it is in a position to achieve what Bostrom calls a Decisive Strategic Advantage (DSA): “a level of technological and other advantages sufficient to enable [an AI] to achieve complete world domination”. After all, if you are capable of achieving superintelligence and a DSA, why act any earlier than that?

Even when dealing with superintelligences, however, the case isn’t quite as clear-cut. Suppose that there are two AGI systems, each potentially capable of achieving a DSA if they prepare for long enough. But the longer that they prepare, the more likely it becomes that the other AGI sets its plans in motion first, and achieves an advantage over the other. Thus, if several AGI projects exist, each AGI is incentivized to take action at such a point which maximizes its overall probability of success – even if the AGI only had rather slim chances of succeeding in the takeover, if it thought that waiting for longer would make its chances even worse.

Indeed, an AGI which defects on its creators may not be going for a world takeover in the first place: it might, for instance, simply be trying to maneuver itself into a position where it can act more autonomously and defeat takeover attempts by other, more powerful AGIs. The threshold for the first treacherous turn could vary quite a bit, depending on the goals and assets of the different AGIs; various considerations are discussed in the paper.

A large reason for analyzing these kinds of scenarios is that, besides caring about existential risks, we also care about catastrophic risks – such as an AGI acting too early and launching a plan which resulted in “merely” hundreds of millions of deaths. My paper introduces the term Major Strategic Advantage, defined as “a level of technological and other advantages sufficient to pose a catastrophic risk to human society”. A catastrophic risk is one that might inflict serious damage to human well-being on a global scale and cause ten million or more fatalities.

“Mere” catastrophic risks could also turn into existential ones, if they contribute to global turbulence (Bostrom et al. 2017), a situation in which existing institutions are challenged, and coordination and long-term planning become more difficult. Global turbulence could then contribute to another out-of-control AI project failing even more catastrophically and causing even more damage

Summary table and example scenarios

The table below summarizes the various alternatives explored in the paper.

 AI’s level of strategic advantage Decisive Major AI’s capability threshold for non-cooperation Very low to very high, depending on various factors Sources of AI capability Individual takeoff Hardware overhang Speed explosion Intelligence explosion Collective takeoff Crucial capabilities Biowarfare Cyberwarfare Social manipulation Something else Gradual shift in power Ways for the AI to achieve autonomy Escape Social manipulation Technical weakness Voluntarily released Economic or competitive reasons Criminal or terrorist reasons Ethical or philosophical reasons Desperation Confidence in lack of capability in values Confined but effectively in control Number of AIs Single Multiple

And here are some example scenarios formed by different combinations of them:

The classic takeover

(Decisive strategic advantage, high capability threshold, intelligence explosion, escaped AI, single AI)

The “classic” AI takeover scenario: an AI is developed, which eventually becomes better at AI design than its programmers. The AI uses this ability to undergo an intelligence explosion, and eventually escapes to the Internet from its confinement. After acquiring sufficient influence and resources in secret, it carries out a strike against humanity, eliminating humanity as a dominant player on Earth so that it can proceed with its own plans unhindered.

(Major strategic advantage, high capability threshold, gradual shift in power, released for economic reasons, multiple AIs)

Many corporations, governments, and individuals voluntarily turn over functions to AIs, until we are dependent on AI systems. These are initially narrow-AI systems, but continued upgrades push some of them to the level of having general intelligence. Gradually, they start making all the decisions. We know that letting them run things is risky, but now a lot of stuff is built around them, it brings a profit and they’re really good at giving us nice stuff—for the while being.

The wars of the desperate AIs

(Major strategic advantage, low capability threshold, crucial capabilities, escaped AIs, multiple AIs)

Many different actors develop AI systems. Most of these prototypes are unaligned with human values and not yet enormously capable, but many of these AIs reason that some other prototype might be more capable. As a result, they attempt to defect on humanity despite knowing their chances of success to be low, reasoning that they would have an even lower chance of achieving their goals if they did not defect. Society is hit by various out-of-control systems with crucial capabilities that manage to do catastrophic damage before being contained.

Is humanity feeling lucky?

(Decisive strategic advantage, high capability threshold, crucial capabilities, confined but effectively in control, single AI)

Google begins to make decisions about product launches and strategies as guided by their strategic advisor AI. This allows them to become even more powerful and influential than they already are. Nudged by the strategy AI, they start taking increasingly questionable actions that increase their power; they are too powerful for society to put a stop to them. Hard-to-understand code written by the strategy AI detects and subtly sabotages other people’s AI projects, until Google establishes itself as the dominant world power.

This blog post was written as part of work for the Foundational Research Institute.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Thursday, January 25th, 2018
7:25 pm - On not getting swept away by mental content
 There’s a specific subskill of meditation that I call “not getting swept away by the content”, that I think is generally valuable. It goes like this. You sit down to meditate and focus on your breath or whatever, and then a worrying thought comes to your mind. And it’s a real worry, something important. And you are tempted to start thinking about it and pondering it and getting totally distracted from your meditation… because this is something that you should probably be thinking about, at some point. So there’s a mental motion that you make, where you note that you are getting distracted by the content of a thought. The worry, even if valid, is content. If you start thinking about whether you should be engaging with the worry, those thoughts are also content. And you are meditating, meaning that this is the time when you shouldn’t be focusing on content. Anything that is content, you dismiss, without examining what that content is. So you dismiss the worry. It was real and important, but it was content, so you are not going to think about it now. You feel happy about having dismissed the content, and you start thinking about how good of a meditator you are, and… realize that this, too, is a thought that you are getting distracted by. So you dismiss that thought, too. Doesn’t matter what the content of the thought is, now is not the time. And then you keep letting go of thoughts that came to your mind, but that doesn’t seem to do anything and you start to wonder whether you are doing this meditation thing right… and aha, that’s content too. So you dismiss that… — The thing that is going on here is that usually, when you experience a distracting thought and want to get rid of it, you often start engaging in an evaluation process of whether that thought should be dismissed or not. By doing so, you may end up engaging with the thought’s own internal logic – which might be totally wrong for the situation. Yes, maybe your relationship is in tatters and your partner is about to leave you. And maybe there are things that you can do to avoid that fate. Or maybe there are not. But if you try to dismiss the thought by disputing the truth or importance of those things, you will fail. Because they are true and important. The way to short-circuit that is to move the evaluation a meta-level up and just decide that whatever is content, gets dismissed on that basis. Doesn’t matter if it’s true. It’s content, so not what you are doing now. You avoid getting entangled up with the thought’s internal logic, because you never engage with the internal logic in the first place. Having this mental motion available to you is also useful outside meditation, if you are prone to having any other thoughts that aren’t actually useful. As I write this, I’m sitting at a food place, eating the food and watching the traffic outside. And, like I often am, I am bothered by pessimistic thoughts about the future of humanity, and all the different disasters that could befall the world. Yeah, I could live to see the day when AIs destroy the world, or worse. That’s true. That’s also content. I’m not going to engage with that content right now. Hmm. I look outside the window, watch cars pass by, and finish my dinner. The food is tasty. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Thursday, January 4th, 2018
9:53 am - Papers for 2017
 I had three new papers either published or accepted into publication last year; all of them are now available online: How Feasible is the Rapid Development of Artificial Superintelligence? Physica Scripta 92 (11), 113001. Abstract: What kinds of fundamental limits are there in how capable artificial intelligence (AI) systems might become? Two questions in particular are of interest: 1) How much more capable could AI become relative to humans, and 2) how easily could superhuman capability be acquired? To answer these questions, we will consider the literature on human expertise and intelligence, discuss its relevance for AI, and consider how AI could improve on humans in two major aspects of thought and expertise, namely simulation and pattern recognition. We find that although there are very real limits to prediction, it seems like AI could still substantially improve on human intelligence. Links: published version (paywalled), free preprint. Disjunctive Scenarios of Catastrophic AI Risk. AI Safety and Security (Roman Yampolskiy, ed.), CRC Press. Forthcoming. Abstract: ​ Artificial intelligence (AI) safety work requires an understanding of what could cause AI to become unsafe. This chapter seeks to provide a broad look at the various ways in which the development of AI sophisticated enough to have general intelligence could lead to it becoming powerful enough to cause a catastrophe. In particular, the present chapter seeks to focus on the way that various risks are disjunctive—on how there are multiple different ways by which things could go wrong, any one of which could lead to disaster. We cover different levels of a strategic advantage an AI might acquire, alternatives for the point where an AI might decide to turn against humanity, different routes by which an AI might become dangerously capable, ways by which the AI might acquire autonomy, and scenarios with varying number of AIs. Whereas previous work has focused on risks specifically only from superintelligent AI, this chapter also discusses crucial capabilities that could lead to catastrophic risk and which could emerge anywhere on the path from near-term “narrow AI” to full-blown superintelligence. Links: free preprint. Superintelligence as a Cause or Cure for Risks of Astronomical Suffering. Informatica 41 (4). (with Lukas Gloor) Abstract: Discussions about the possible consequences of creating superintelligence have included the possibility of existential risk , often understood mainly as the risk of human extinction. We argue that suffering risks (s-risks) , where an adverse outcome would bring about severe suffering on an astronomical scale, are risks of a comparable severity and probability as risks of extinction. Preventing them is the common interest of many different value systems. Furthermore, we argue that in the same way as superintelligent AI both contributes to existential risk but can also help prevent it, superintelligent AI can both be a suffering risk or help avoid it. Some types of work aimed at making superintelligent AI safe will also help prevent suffering risks, and there may also be a class of safeguards for AI that helps specifically against s-risks. Links: published version (open access). In addition, my old paper Responses to Catastrophic AGI Risk (w/ Roman Yampolskiy) was republished, with some minor edits, as the book chapters “Risks of the Journey to the Singularity” and “Responses to the Journey to the Singularity”, in The Technological Singularity: Managing the Journey (Victor Callaghan et al, eds.), Springer-Verlag. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Friday, December 8th, 2017
2:10 pm - Fixing science via a basic income
Monday, December 4th, 2017
1:02 pm - Book review: The Upside of Your Dark Side: Why Being Your Whole Self–Not Just Your “Good
 The Upside of Your Dark Side: Why Being Your Whole Self–Not Just Your “Good” Self–Drives Success and Fulfillment. By Todd Kashdan & Robert Biswas-Diener. Avery, 2014. This book was written by a pair of psychologists who thought that the excessive focus on good and positive feelings in positive psychology was a little overblown, and that the value of so-called “negative” feelings or aspects of personality was being neglected. They do think that it’s good for us to be happy most of the time, but that it will be even better for us if we have a flexibility that allows us to switch to non-happy states of mind when it’s beneficial. They suggest an 80:20 ratio as a rough rule of thumb: be happy 80% of the time and non-happy 20% of the time. They call this philosophy “wholeness”: a person is whole if they are able to flexibly tap into all aspects of their being when it’s warranted. The authors offer a number of examples about the value of so-called negative states. Too much comfort makes us oversensitive to inevitable discomfort. Anger motivates us to act, fix injustices, and defend ourselves and our loved ones; guilt tells us when we’ve screwed up and motivates us to improve our behavior; anxiety helps us catch mistakes and take safeguards against risks. Happy people are less persuasive, can be too trusting, and are lazier thinkers. Intentionally trying to become happy easily backfires and makes us less happy; and there are situations where happiness feels inappropriate and will make others respond worse to you. Sometimes it’s better to act on instinct or engage in mind-wandering than to always be mindful and think things through consciously. The “dark triad” traits of narcissism, Machiavellianism, and psychopathy are all useful in moderation and provide benefits such as fearlessness and self-assuredness. The following paragraph from the final chapter is a pretty good summary of the book’s message: The basic idea is that psychological states are instrumental. That is, they are useful for a specific purpose, such as finding your car keys, being physically safe in a parking garage, negotiating a business deal, or arguing with your child’s teacher. Rather than viewing your thoughts and feelings as reactions to external events, we argue that you ought to view these states as tools to be used as circumstances warrant. Simply put, quit labeling your inner states as good or bad or positive or negative, and start thinking of them as useful or not useful for any given situation. While I liked the book’s message and agreed with many of its points, I felt like it was mostly trying to tell a story that sounds plausible to a layman, rather than making a particularly rigorous argument. The authors tend to base their claims on isolated studies with no mention of their replication status; some of their example studies draw on paradigms and methods that have been seriously challenged (social priming and implicit association tests); occasionally they made claims that I thought contradicted things I knew from elsewhere; and some of the cited empirical results seem to have alternative interpretations that are more natural than the ones offered in the book. It’s plausible that they are drawing on much more rigorous academic work and that the argument has been dumbed down for a popular audience: even granting them the benefit of doubt, the book still feels way too much like a collection of examples that have been cherry-picked to make the wanted points. Regardless, the book’s general message feels almost certainly correct – after all, why would we have evolved negative states if they weren’t sometimes useful? – so if anyone feels like they’ve been overwhelmed with too many messages of positivity, I would recommend this book for inspiration and an alternative viewpoint, if not for any of its specific details. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Monday, November 6th, 2017
1:05 pm - Meditation and mental space
 One effect that I often notice after my meditation practice has been interrupted and I then manage to resume it again, is an increase in a kind of mental resilience. That is, when I have a lower resilience, feeling bad for any reason feels much more like an emergency. It’s something that forces itself into my consciousness, takes over, and refuses to go away. I would like to ignore it, but I can’t; as long as it’s there, it’s hard to think of anything else. When my resilience is higher, it’s like my mind has more room for thoughts and emotions. Something might be making me feel bad, but something else might also be making me feel good, and there’s space for those two to intermingle. It becomes much easier to accept that I’m feeling a little bad, but I don’t need to do anything about it. I can just go on and do something else, and the nasty feeling might go away on its own – or if it doesn’t, that’s fine too. Interestingly, being on antidepressants can also give me a similar effect. Of course, in itself this kind of an effect isn’t too surprising, given that it’s one of the explicit goals of the practice. Culadasa’s The Mind Illuminated notes that two of the goals of mindfulness practice are an increase in the amount of “conscious power” (roughly, the amount of things that can be consciously processed at a time), as well as learning to more intentionally shift the focus of attention, so that it won’t just automatically go to the most painful or pleasant thing and become preoccupied with that, but can rather be controlled in a more useful manner. Still, it’s nice to see that the practice is bearing fruit. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Tuesday, October 17th, 2017
10:19 am - Anti-tribalism and positive mental health as high-value cause areas
 I think that tribalism is one of the biggest problems with humanity today, and that even small reductions of it could cause a massive boost to well-being. By tribalism, I basically mean the phenomenon where arguments and actions are primarily evaluated based on who makes them and which group they seem to support, not anything else. E.g. if a group thinks that X is bad, then it’s often seen as outright immoral to make an argument which would imply that X isn’t quite as bad, or that some things which are classified as X would be more correctly classified as non-X instead. I don’t want to give any specific examples so as to not derail the discussion, but hopefully everyone can think of some; the article “Can Democracy Survive Tribalism” lists lot of them, picked from various sides of the political spectrum. Joshua Greene (among others) makes the argument, in his book Moral Tribes, that tribalism exists for the purpose of coordinating aggression and alliances against other groups (so that you can kill them and take their stuff, basically). It specifically exists for the purpose of making you hurt others, as well as defend yourself against people who would hurt you. And while defending yourself against people who would hurt you is clearly good, attacking others is clearly not. And everything being viewed in tribal terms means that we can’t make much progress on things that actually matter: as someone commented, “people are fine with randomized controlled trials in policy, as long as the trials are on things that nobody cares about”. Given how deep tribalism sits in the human psyche, it seems unlikely that we’ll be getting rid of it anytime soon. That said, there do seem to be a number of things that affect the amount of tribalism we have: * As Steven Pinker argues in The Better Angels of Our Nature, violence in general has declined over historical time, replaced by more cooperation and an assumption of human rights; Democrats and Republicans may still hate each other, but they generally agree that they still shouldn’t be killing each other. * As a purely anecdotal observation, I seem to get the feeling that people on the autism spectrum tend to be less tribal, up to the point of not being able to perceive tribes at all. (this suggests, somewhat oddly, that the world would actually be a better place if everyone was slightly autistic) * Feelings of safety or threat seem to play a lot into feelings of tribalism: if you perceive (correctly or incorrectly) that a group Y is out to get you and that they are a real threat to you, then you will react much more aggressively to any claims that might be read as supporting Y. Conversely, if you feel safe and secure, then you are much less likely to feel the need to attack others. The last point is especially troublesome, since it can give rise to self-fulfilling predictions. Say that Alice says something to Bob, and Bob misperceives this as an insult; Bob feels threatened so snaps at Alice, and now Alice feels threatened as well, so shouts back. The same kind of phenomenon seems to be going on a much larger scale: whenever someone perceives a threat, they are no longer willing to give someone the benefit of doubt, and would rather treat the other person as an enemy. (which isn’t too surprising, since it makes evolutionary sense: if someone is out to get you, then the cost of misclassifying them as a friend is much bigger than the cost of misclassifying a would-be friend as an enemy. you can always find new friends, but it only takes one person to get near you and hurt you really bad) One implication might be that general mental health work, not only in the conventional sense of “healing disorders”, but also the positive psychology-style mental health work that actively seeks to make people happy rather than just fine, could be even more valuable for society than we’ve previously thought. Curing depression etc. would be enormously valuable even by itself, but if we could figure out how to make people generally happier and resilient to negative events, then fewer things would threaten their well-being and they would perceive fewer things as being threats, reducing tribalism. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Saturday, October 14th, 2017
11:29 am - You can never be universally inclusive
 A discussion about the article “We Don’t Do That Here” (h/t siderea) raised the question about the tension between having inclusive social norms on the one hand, and restricting some behaviors on the other hand. At least, that was the way the discussion was initially framed. The thing is, inclusivity is a bit of a bad term, since you can never really be universally inclusive. Accepting some behaviors is going to attract people who like engaging in those behaviors while repelling people who don’t like those behaviors; and vice versa for disallowing them. Of course, you can still create spaces that are more inclusive than others, in being comfortable to a broader spectrum of people. But the way you do that, is by disallowing behaviors that would, if allowed, repel more people that the act of disallowing them does. If you use your social power to shut up people who would otherwise be loudly racist and homophobic and who then leave because they don’t want to be in a place where those kinds of behaviors aren’t allowed, then that would fit the common definition of “inclusive space” pretty well. That said, the “excluding racists and homophobes” thing may make it sound like you’re only excluding “bad” people, which isn’t the case either. Every set of rules (including having no rules in the first place) is going to repel some completely decent people. Like, maybe you decide to try to make a space more inclusive by having a rule like “no discussing religion or politics”. This may make the space more inclusive towards people of all kinds of religions and political backgrounds, since there is less of a risk of anyone feeling unwelcome when everyone else turns out to disagree with their beliefs. But at the same time, you are making the space less inclusive towards people who are perfectly reasonable and respectful people, but who would like to discuss religion or politics. As well as to people who aren’t so good at self-regulation and will feel uncomfortable about having to keep a constant eye on themselves to avoid saying the wrong things. And maybe these people would feel more comfortable at a different event with different rules, which was more inclusive towards them. Which is fine. Competing access needs: Competing access needs is the idea that some people, in order to be able to participate in a community, need one thing, and other people need a conflicting thing, and instead of figuring out which need is ‘real’ we have to acknowledge that we can’t accommodate all valid needs. I originally encountered it in disability community conversations: for example, one person might need a space where they can verbally stim, and another person might need a space where there’s never multiple people talking at once. Both of these are valid, but you can’t accommodate them both in the same space. I wrote a while ago that I think this concept extends to a lot of activist/social justice community challenges and a lot of the difficulty of designing good messages. For example, body positivity: some people need to hear “love your body! no matter who you are you are soooo sexy” and some people really hate being told that they’re ‘sexy’. Or some gay people might need a space where it’s against the rules to ask “well, what if it actually is morally wrong to be gay?” but other gay people (like me of a few years ago) might need a space where they can ask that so there can be a serious discussion and they can become convinced that they’re okay. Every set of rules is going to be bad for someone, so a better question than “how to make this space inclusive” is “who do we want to make this space inclusive towards”. You’re always going to exclude some people who aren’t jerks or bad people, but would just prefer a different set of rules. And you just have to accept that. Originally published at Kaj Sotala. You can comment here or there. (2 echoes left behind | Leave an echo)
Saturday, October 7th, 2017
5:30 pm - What are your plans for the evening of the apocalypse?
 If everyone found out for sure that the world would end in five years, what would happen?My guess is that it would take time before anything big happened. Finding out about the end of the world, that’s the kind of a thing that you need to digest for a while. For the first couple of days, people might go “huh”, and then carry on with their old routines while thinking about it.A few months later, maybe there still wouldn’t be all that much change. Sure, people would adjust their life plans, start thinking more near-term, some would decide not to go to college after all. But a lot of people already don’t plan much beyond a couple of years; five years is a long time, and you’ll still need to pay your bills until the Apocalypse hits. So many people might just carry on with their jobs as normal; if they were already doing college, well, you need to pass the time until the end of the world somehow. Might as well keep studying.Of course, some people would have bigger reactions, right from day one. Quit their unsatisfying job, that kind of thing. People with a lot of savings might choose this moment to start living off them. And as the end of the world got closer and closer, people might get an increasingly relaxed attitude to work; though there might also be a feeling of, we’re all in this together, let’s make our existing institutions work until the end. I could imagine doctors and nurses in a hospital, who had decided that they want to make sure the hospital runs for as long as it can, and that nobody has to die before they really have to.But I could also imagine, say, the waiter at some restaurant carrying on, serving customers even on the night of the apocalypse. (Be sure to make a reservation, we expect to have no free tables that evening.) Maybe out of principles, maybe out of professional pride, but maybe just out of habit.I’m guessing there would be gradual changes to society, with occasional tipping points when a lot of people decided to stop whatever they had been doing and that created a chain reaction of others doing so as well. But it seems really hard to guess for how long things would remain mostly normal.Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Thursday, October 5th, 2017
11:24 am - Meaningfulness and the scope of experience