No account? Create an account
 A view to the gallery of my mind > recent entries > calendar > friends > Website > profile > previous 20 entries
Tuesday, November 20th, 2018
9:06 pm - Incorrect hypotheses point to correct observations
9:06 pm - Incorrect hypotheses point to correct observations
9:06 pm - Incorrect hypotheses point to correct observations
9:06 pm - Incorrect hypotheses point to correct observations
Tuesday, October 23rd, 2018
1:20 pm - Mark Eichenlaub: How to develop scientific intuition
 Recently on the CFAR alumni mailing list, someone asked a question about how to develop scientific intuition. In response, Mark Eichenlaub posted an excellent and extensive answer, which was so good that I asked for permission to repost it in public. He graciously gave permission, so I’ve reproduced his message below. (He otherwise retains the rights to this, meaning that the standard CC license on my blog doesn’t apply to this post.) From: Mark Eichenlaub Date: Tue, Oct 23, 2018 at 9:34 AM Subject: Re: [CFAR Alumni] Suggestions for developing scientific intuition Sorry for the length, I recently finished a PhD on this topic. (After I wrote the answer kerspoon linked, I went to grad school to study the topic.) This is specifically about solving physics problems but hopefully speaks to intuition a bit more broadly in places. I mostly think of intuition as the ability to quickly coordinate a large number of small heuristics. We know lots of small facts and patterns, and intuition is about matching the relevant ones onto the current situation. The little heuristics are often pretty local and small in scope. For example, the other day I heard this physics problem: You set up a trough with water in it. You hang just barely less than half of the trough off the edge of a table, so that it balances, but even a small force at the far end would make it tip over. You put a boat in the trough at the end over the table. The trough remains balanced. Then you slowly push the boat down to the other end of the trough, so that’s it’s in the part of the trough that hangs out from the table. What happens? (I.E. does the trough tip over?) The answer is (rot13) Gur gebhtu qbrf abg gvc; vg erznvaf onynaprq (nf ybat nf gur zbirzrag bs gur obng vf fhssvpvragyl fybj fb gung rirelguvat erznvaf va rdhvyvoevhz). I knew this “intuitively”, by which I mean I got it within a second or so of understanding the question, and without putting in conscious effort to thinking about it. (I wasn’t certain I was right until I had consciously thought it out, but I was reasonably confident within a second, and my intuition bore out.) I don’t think this was due to some sort of general intuition about problem solving, science, physics, mechanics, or even floating. It felt like I could solve the problem intuitively specifically because I had seen sufficiently-similar things that led me to the specific heuristic “a floating object spreads its weight out evenly over the bottom of the container it’s floating in.” Then I think of “having intuition” in physics as having maybe a thousand little rules like that and knowing when to call on which one. For this particular heuristic, there is a classic problem asking what happens to the water level in a lake if you are in a boat with a rock, and you throw the rock into the water and it sinks to the bottom. One solution to that problem is that when the rock is on the bottom of the lake, it exerts more force on that part of the bottom of the lake than is exerted at other places. By contrast, when the rock is still in the boat, the only thing touching the bottom of the lake is water, and the water pressure is the same everywhere, so the weight of the rock is distributed evenly across the entire lake. The total force on the bottom of the lake doesn’t change between the two scenarios (because gravity pulls on everything just as hard either way), when the rock is sitting on the bottom of the lake and the force on the bottom of the lake is higher under the rock, it must be lower everywhere else to compensate. The pressure everywhere else is $\rho g h$, so if that goes down, the level of the lake goes down. Conclusion: when you throw the rock overboard, the level of the lake goes down a bit. When I thought about that problem, I presumably built the “weight distributed evenly” heuristic. All I had to do was quickly apply it to the trough problem to solve that one as well. And if someone else also had a background in physics but didn’t find the trough problem easy, it’s probably because they simply hadn’t happened to think about the boat problem, or some other similar problems, in the right way, and hadn’t come away with the heuristic about the weight of floating things being spread out evenly. To me, this picture of intuition as small heuristics doesn’t look good for the idea of developing powerful intuition. The “weight gets spread out by floating” heuristic is not likely to transfer to much else. I’ve used it for two physics problems about floating things and, as far as I know, nothing else. You can probably think of lots of similar heuristics. For example, “conservation of expected evidence“. You might catch a mistake in someone’s reasoning, or an error in a long probability calculation you made, if you happen to notice that the argument or calculation violates conservation of expected evidence. The nice thing about this is that it can happen almost automatically. You don’t have to stop after every calculation or argument and think, “does this break conservation of expected evidence?”. Instead, you wind up learning some sorts of triggers that you associate with the principle that prime it in your mind, and then, if it becomes relevant to the argument, you notice that and cite the principle. In this picture, building intuition is about learning a large number of these heuristics, along with their triggers. However, while the individual small heuristics are often the easiest things to point to in an intuitive solution to a problem, I do think there are more general, and therefore more transferrable parts of intuition as well. I imagine that the paragraph I wrote explaining the solution to the boat problem will be largely incomprehensible to someone who hasn’t studied physics. That’s partially because it uses concepts they won’t have a rigorous understanding of (e.g. pressure), that it tacitly uses small heuristics it didn’t explain (e.g. that the reason the pressure is the same along the bottom of the lake is that if it weren’t, there would be horizontal forces that push the water around until the pressure did equalize in this way), partially that it made simplifications that it didn’t state and it might not be clear are justified (e.g. that the bottom of the lake is flat). More importantly, it relies on a general framework of Newtonian mechanics. For example, there are a number of tacit applications of Newton’s laws in the argument. For example, I stated that the total force on the bottom of the lake is the same whether the rock is resting on the bottom or floating in the boat “because gravity pulls on everything just as hard either way”, but these aren’t directly connected concepts. Gravity pulls the system (boat + water + rock) down just as hard no matter where the rock is. That system is not accelerating, so by Newton’s second law, the bottom of the lake pushes up on that system just as hard in each scenario. And by Newton’s third law, the system pushes down on the bottom of the lake just as hard in each scenario. So understanding the argument involves some fairly general heuristics such as “apply Newton’s second law to an object in equilibrium to show that two forces on it have equal magnitude” – a heuristic I’ve used hundreds of times, and “decide what objects to define as part of a system fluidly as you go through a problem” (in this case, switching from thinking about the rock as a system to thinking about rock+boat+water as a single system) – a skill I’ve used hundreds to thousands of times across all of physics. (My job is to teach high schoolers to be really good at solving problems like this, so I spend way more time on it than most people, so applying a heuristic specific to solving introductory physics problems in a thousand independent instances is realistic for me.) Then there may be more meta-level skills and heuristics that you develop in solving problems. These could be things like valuing non-calculation solutions, or believing that persevering on a tough problem is worthwhile. It’s also important that intuition isn’t just about having lots of little heuristics. It’s about organizing them and calling the right one up at the right time. You’ll have to ask yourself the right sorts of questions to prompt yourself to find the right heuristics, and that’s probably a pretty general skill. There is a fair amount of research on trying to understand what all these little heuristics are and how to develop them, but I’m mostly familiar with the research in physics. In the Quora answer kerspoon linked, I cited George Lakoff, and I still that he’s a good source for understanding how we go about taking primitive sorts of concepts (e.g. “up” and “down”) and using and adapting them, via partial metaphor, to understanding more abstract things. For a specific example that’s well-argued, see: Wittmann, Michael C., and Katrina E. Black. “Mathematical actions as procedural resources: An example from the separation of variables.” Physical Review Special Topics-Physics Education Research 11.2 (2015): 020114. They argue that students understand the arithmetic action “separation of variables” via analogy to their physical understanding of taking things and physically moving them around. However, I think Wittman and Black’s work is incomplete. For example, it doesn’t explain why students using the motion analogy for separation of variables do it correctly – they could just as well use motion to encode algebraically-invalid rules. Also, they don’t explain how the analogy develops. They just catalog that it exists. A foundational work in trying to understand the components of physical intuition is: DiSessa, Andrea A. “Toward an epistemology of physics.” Cognition and instruction 10.2-3 (1993): 105-225. This work establishes “phenomenological primitives”; little core heuristics such as “near is more”, which are templates for physical reasoning. Drawing from these templates, we might conclude that the nearer you are to a speaker, the louder the sound, or that the nearer you are to the sun, the hotter it will be (and therefore that summer is hot because the Earth is nearer the sun – a false but common and reasonable belief). That’s a long and somewhat-obscure paper. I really like his student’s work Sherin, Bruce L. “How students understand physics equations.” Cognition and instruction 19.4 (2001): 479-541. Like Disessa, Sherin builds his own framework for what intuition is. His scope is more limited though, focusing solely on building and interpreting certain types of equations in a manner that combines “intuitive” physical ideas and mathematical templates. He spells this out in detail more in the paper, and it’s incredibly clear and well-argued. Probably my favorite paper in the field. A more general reference that’s much more accessible than Disessa and more general an overview of cognition in physics than Sherin is “How Should We Think About How Our Students Think” by my advisor, Joe Redish http://media.physics.harvard.edu/video/?id=COLLOQ_REDISH_093013 (video) https://arxiv.org/abs/1308.3911 (paper). The actual process of building new heuristics is also studied, but over all I don’t think we know all that much. See my friend Ben’s paper Dreyfus, Benjamin W., Ayush Gupta, and Edward F. Redish. “Applying conceptual blending to model coordinated use of multiple ontological metaphors.” International Journal of Science Education 37.5-6 (2015): 812-838. for an example of theory-building around how we create new intuitions. He calls on a framework from cognitive science called “conceptual blending” that is rather formal, but I think pretty entertaining to read. A relevant search terms in the education literature: “conceptual change” but I find a lot of this literature to be hard-to-follow and not always a productive use of time to read. On the applied side, I think the state of the art in evidence-backed approaches to building intuition, at least in physics, is modeling instruction. I’m not sure what the best introduction to modeling instruction is. They have a website that seems okay. Eric Brewe writes on it and he’s usually very good. The basic idea is to have students collaboratively participate in the building of the theories of physics they’re using (in a specific way, with guidance and direction from a trained instructor), which gets them to think about the “whys” involved with a particular theory or model in a way they usually wouldn’t. I have written some about why I think things like checking the extreme cases of a formula are powerful intuition-building tools. A preprint is available here: https://arxiv.org/pdf/1804.01639.pdf However, I think it’s dangerous to have rules like “always check the dimensions of your answer”, “always check the extreme cases of a formula”, or even “always check that the numbers come out reasonable.” The reason is that having these things as procedures tends to encourage students to follow them by rote. A large part of the cognitive work involved isn’t in checking the extreme cases or the dimensions, but in realizing that in this particular situation, that would be a good thing to do. If you’re doing it only because an external prompt is telling you to, you aren’t building the appropriate meta-level habits. See https://www.tandfonline.com/doi/abs/10.1080/09500693.2017.1308037 for an example of this effect. See papers on “metarepresentation” by Disessa and/or Sherin for another example of generalizable skills related to intuition and problem solving. Unfortunately, I don’t think writing books well or writing courses of individual study is something we know much about. I don’t know anyone who has a significant grant for that; the most I’ve ever seen on it is a poster here or there at a conference. Generally, grants are awarded for improving high school and college courses, or for professional development programs, supporting department or institution level changes at schools, etc. So adults who just want to learn on their own are not really served much by the research on the area. If you’re an adult who wants to self-study theoretical physics with an eye towards intuition, I recommend Leonard Susskind’s series of courses “The Theoretical Minimum” (the first three courses exist as books, the rest only as video lectures). He approaches mathematical topics with what I find an intuitive approach in most cases. Of course the Feynman lectures on physics are also very good. I’ll be building an introduction to physics course at Art of Problem Solving, starting work sometime this winter. It might be available in the spring, although students will mostly be middle and high school students (but anyone is welcome to take our courses). I currently teach an advanced physics problem-solving course at AoPS called “PhysicsWOOT”. I try to support intuition-building practices there, but the main aim is in training these many small heuristics which students need to solve contest problems. There should be something like modeling instruction for adult independent learners, but I don’t know of it. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Tuesday, October 9th, 2018
7:40 am - On insecurity as a friend
Sunday, August 12th, 2018
9:04 am - New paper: Long-Term Trajectories of Human Civilization
 Long-Term Trajectories of Human Civilization (free PDF). Foresight, forthcoming, DOI 10.1108/FS-04-2018-0037. Authors: Seth D. Baum, Stuart Armstrong, Timoteus Ekenstedt, Olle Häggström, Robin Hanson, Karin Kuhlemann, Matthijs M. Maas, James D. Miller, Markus Salmela, Anders Sandberg, Kaj Sotala, Phil Torres, Alexey Turchin, and Roman V. Yampolskiy. Abstract Purpose: This paper formalizes long-term trajectories of human civilization as a scientific and ethical field of study. The long-term trajectory of human civilization can be defined as the path that human civilization takes during the entire future time period in which human civilization could continue to exist. Approach: We focus on four types of trajectories: status quo trajectories, in which human civilization persists in a state broadly similar to its current state into the distant future; catastrophe trajectories, in which one or more events cause significant harm to human civilization; technological transformation trajectories, in which radical technological breakthroughs put human civilization on a fundamentally different course; and astronomical trajectories, in which human civilization expands beyond its home planet and into the accessible portions of the cosmos. Findings: Status quo trajectories appear unlikely to persist into the distant future, especially in light of long-term astronomical processes. Several catastrophe, technological transformation, and astronomical trajectories appear possible. Value: Some current actions may be able to affect the long-term trajectory. Whether these actions should be pursued depends on a mix of empirical and ethical factors. For some ethical frameworks, these actions may be especially important to pursue. An excerpt from the press release over at the Global Catastrophic Risk Institute: Society today needs greater attention to the long-term fate of human civilization. Important present-day decisions can affect what happens millions, billions, or trillions of years into the future. The long-term effects may be the most important factor for present-day decisions and must be taken into account. An international group of 14 scholars calls for the dedicated study of “long-term trajectories of human civilization” in order to understand long-term outcomes and inform decision-making. This new approach is presented in the academic journal Foresight, where the scholars have made an initial evaluation of potential long-term trajectories and their present-day societal importance. “Human civilization could end up going in radically different directions, for better or for worse. What we do today could affect the outcome. It is vital that we understand possible long-term trajectories and set policy accordingly. The stakes are quite literally astronomical,” says lead author Dr. Seth Baum, Executive Director of the Global Catastrophic Risk Institute, a non-profit think tank in the US. The group of scholars including Olle Häggström, Robin Hanson, Karin Kuhlemann, Anders Sandberg, and Roman Yampolskiy have identified four types of long-term trajectories: status quo trajectories, in which civilization stays about the same, catastrophe trajectories, in which civilization collapses, technological transformation trajectories, in which radical technology fundamentally changes civilization, and astronomical trajectories, in which civilization expands beyond our home planet. Available here: https://kajsotala.fi/assets/2018/08/trajectories.pdf Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Friday, August 3rd, 2018
3:00 pm - Finland Museum Tour 1/??: Tampere Art Museum
Sunday, March 18th, 2018
10:15 am - Is the Star Trek Federation really incapable of building AI?
 In the Star Trek universe, we are told that it’s really hard to make genuine artificial intelligence, and that Data is so special because he’s a rare example of someone having managed to create one. But this doesn’t seem to be the best hypothesis for explaining the evidence that we’ve actually seen. Consider: In the TOS episode “The Ultimate Computer“, the Federation has managed to build a computer intelligent enough to run the Enterprise by its own, but it goes crazy and Kirk has to talk it into self-destructing. In TNG, we find out that before Data, Doctor Noonian Soong had built Lore, an android with sophisticated emotional processing. However, Lore became essentially evil and had no problems killing people for his own benefit. Data worked better, but in order to get his behavior right, Soong had to initially build him with no emotions at all. (TNG: “Datalore“, “Brothers“) In the TNG episode “Evolution“, Wesley is doing a science project with nanotechnology, accidentally enabling the nanites to become a collective intelligence which almost takes over the ship before the crew manages to negotiate a peaceful solution with them. The holodeck seems entirely capable of running generally intelligent characters, though their behavior is usually restricted to specific roles. However, on occasion they have started straying outside their normal parameters, to the point of attempting to take over the ship. (TNG: “Elementary, Dear Data“) It is also suggested that the computer is capable of running an indefinitely long simulation which is good enough to make an intelligent being believe in it being the real universe. (TNG: “Ship in a Bottle“) The ship’s computer in most of the series seems like it’s potentially quite intelligent, but most of the intelligence isn’t used for anything else than running holographic characters. In the TNG episode “Booby Trap“, a potential way of saving the Enterprise from the Disaster Of The Week would involve turning over control of the ship to the computer: however, the characters are inexplicably super-reluctant to do this. In Voyager, the Emergency Medical Hologram clearly has general intelligence: however, it is only supposed to be used in emergency situations rather than running long-term, its memory starting to degrade after a sufficiently long time of continuous use. The recommended solution is to reset it, removing all of the accumulated memories since its first activation. (VOY: “The Swarm“) There seems to be a pattern here: if an AI is built to carry out a relatively restricted role, then things work fine. However, once it is given broad autonomy and it gets to do open-ended learning, there’s a very high chance that it gets out of control. The Federation witnessed this for the first time with the Ultimate Computer. Since then, they have been ensuring that all of their AI systems are restricted to narrow tasks or that they’ll only run for a short time in an emergency, to avoid things getting out of hand. Of course, this doesn’t change the fact that your AI having more intelligence is generally useful, so e.g. starship computers are equipped with powerful general intelligence capabilities, which sometimes do get out of hand. Dr. Soong’s achievement with Data was not in building a general intelligence, but in building a general intelligence which didn’t go crazy. (And before Data, he failed at that task once, with Lore.) The Federation’s issue with AI is not that they haven’t solved artificial general intelligence. The Federation’s issue is that they haven’t reliably solved the AI alignment problem. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Monday, February 12th, 2018
11:33 am - Some conceptual highlights from “Disjunctive Scenarios of Catastrophic AI Risk”

My forthcoming paper, “Disjunctive Scenarios of Catastrophic AI Risk”, attempts to introduce a number of considerations to the analysis of potential risks from Artificial General Intelligence (AGI). As the paper is long and occasionally makes for somewhat dry reading, I thought that I would briefly highlight a few of the key points raised in the paper.

The main idea here is that most of the discussion about risks of AGI has been framed in terms of a scenario that goes something along the lines of “a research group develops AGI, that AGI develops to become superintelligent, escapes from its creators, and takes over the world”. While that is one scenario that could happen, focusing too much on any single scenario makes us more likely to miss out alternative scenarios. It also makes the scenarios susceptible to criticism from people who (correctly!) point out that we are postulating very specific scenarios that have lots of burdensome details.

To address that, I discuss here a number of considerations that suggest disjunctive paths to catastrophic outcomes: paths that are of the form “A or B or C could happen, and any one of them happening could have bad consequences”.

Superintelligence versus Crucial Capabilities

Bostrom’s Superintelligence, as well as a number of other sources, basically make the following argument:

1. An AGI could become superintelligent
2. Superintelligence would enable the AGI to take over the world

This is an important argument to make and analyze, since superintelligence basically represents an extreme case: if an individual AGI may become as powerful as it gets, how do we prepare for that eventuality? As long as there is a plausible chance for such an extreme case to be realized, it must be taken into account.

However, it is probably a mistake to focus only on the case of superintelligence. Basically, the reason why we are interested in a superintelligence is that, by assumption, it has the cognitive capabilities necessary for a world takeover. But what about an AGI which also had the cognitive capabilities necessary for taking over the world, and only those?

Such an AGI might not count as a superintelligence in the traditional sense, since it would not be superhumanly capable in every domain. Yet, it would still be one that we should be concerned about. If we focus too much on just the superintelligence case, we might miss the emergence of a “dumb” AGI which nevertheless had the crucial capabilities necessary for a world takeover.

That raises the question of what might be such crucial capabilities. I don’t have a comprehensive answer; in my paper, I focus mostly on the kinds of capabilities that could be used to inflict major damage: social manipulation, cyberwarfare, biological warfare. Others no doubt exist.

A possibly useful framing for future investigations might be, “what level of capability would an AGI need to achieve in a crucial capability in order to be dangerous”, where the definition of “dangerous” is free to vary based on how serious of a risk we are concerned about. One complication here is that this is a highly contextual question – with a superintelligence we can assume that the AGI may get basically omnipotent, but such a simplifying assumption won’t help us here. For example, the level of offensive biowarfare capability that would pose a major risk, depends on the level of the world’s defensive biowarfare capabilities. Also, we know that it’s possible to inflict enormous damage to humanity even with just human-level intelligence: whoever is authorized to control the arsenal of a nuclear power could trigger World War III, no superhuman smarts needed.

Crucial capabilities are a disjunctive consideration because they show that superintelligence isn’t the only level of capability that would pose a major risk: and there many different combinations of various capabilities – including ones that we don’t even know about yet – that could pose the same level of danger as superintelligence.

Incidentally, this shows one reason why the common criticism of “superintelligence isn’t something that we need to worry about because intelligence isn’t unidimensional” is misfounded – the AGI doesn’t need to be superintelligent in every dimension of intelligence, just the ones we care about.

How would the AGI get free and powerful?

In the prototypical AGI risk scenario, we are assuming that the developers of the AGI want to keep it strictly under control, whereas the AGI itself has a motive to break free. This has led to various discussions about the feasibility of “oracle AI” or “AI confinement” – ways to restrict the AGI’s ability to act freely in the world, while still making use of it. This also means that the AGI might have a hard time acquiring the resources that it needs for a world takeover, since it either has to do so while it is under constant supervision by its creators, or while on the run from them.

However, there are also alternative scenarios where the AGI’s creators voluntarily let it free – or even place it in control of e.g. a major corporation, free to use that corporation’s resources as it desires! My chapter discusses several ways by which this could happen: i) economic benefit or competitive pressure, ii) criminal or terrorist reasons, iii) ethical or philosophical reasons, iv) confidence in the AI’s safety, as well as v) desperate circumstances such as being otherwise close to death. See the chapter for more details on each of these. Furthermore, the AGI could remain theoretically confined but be practically in control anyway – such as in a situation where it was officially only giving a corporation advice, but its advice had never been wrong before and nobody wanted to risk their jobs by going against the advice.

Would the Treacherous Turn involve a Decisive Strategic Advantage?

Looking at crucial capabilities in a more fine-grained manner also raises the question of when an AGI would start acting against humanity’s interests. In the typical superintelligence scenario, we assume that it will do so once it is in a position to achieve what Bostrom calls a Decisive Strategic Advantage (DSA): “a level of technological and other advantages sufficient to enable [an AI] to achieve complete world domination”. After all, if you are capable of achieving superintelligence and a DSA, why act any earlier than that?

Even when dealing with superintelligences, however, the case isn’t quite as clear-cut. Suppose that there are two AGI systems, each potentially capable of achieving a DSA if they prepare for long enough. But the longer that they prepare, the more likely it becomes that the other AGI sets its plans in motion first, and achieves an advantage over the other. Thus, if several AGI projects exist, each AGI is incentivized to take action at such a point which maximizes its overall probability of success – even if the AGI only had rather slim chances of succeeding in the takeover, if it thought that waiting for longer would make its chances even worse.

Indeed, an AGI which defects on its creators may not be going for a world takeover in the first place: it might, for instance, simply be trying to maneuver itself into a position where it can act more autonomously and defeat takeover attempts by other, more powerful AGIs. The threshold for the first treacherous turn could vary quite a bit, depending on the goals and assets of the different AGIs; various considerations are discussed in the paper.

A large reason for analyzing these kinds of scenarios is that, besides caring about existential risks, we also care about catastrophic risks – such as an AGI acting too early and launching a plan which resulted in “merely” hundreds of millions of deaths. My paper introduces the term Major Strategic Advantage, defined as “a level of technological and other advantages sufficient to pose a catastrophic risk to human society”. A catastrophic risk is one that might inflict serious damage to human well-being on a global scale and cause ten million or more fatalities.

“Mere” catastrophic risks could also turn into existential ones, if they contribute to global turbulence (Bostrom et al. 2017), a situation in which existing institutions are challenged, and coordination and long-term planning become more difficult. Global turbulence could then contribute to another out-of-control AI project failing even more catastrophically and causing even more damage

Summary table and example scenarios

The table below summarizes the various alternatives explored in the paper.

 AI’s level of strategic advantage Decisive Major AI’s capability threshold for non-cooperation Very low to very high, depending on various factors Sources of AI capability Individual takeoff Hardware overhang Speed explosion Intelligence explosion Collective takeoff Crucial capabilities Biowarfare Cyberwarfare Social manipulation Something else Gradual shift in power Ways for the AI to achieve autonomy Escape Social manipulation Technical weakness Voluntarily released Economic or competitive reasons Criminal or terrorist reasons Ethical or philosophical reasons Desperation Confidence in lack of capability in values Confined but effectively in control Number of AIs Single Multiple

And here are some example scenarios formed by different combinations of them:

The classic takeover

(Decisive strategic advantage, high capability threshold, intelligence explosion, escaped AI, single AI)

The “classic” AI takeover scenario: an AI is developed, which eventually becomes better at AI design than its programmers. The AI uses this ability to undergo an intelligence explosion, and eventually escapes to the Internet from its confinement. After acquiring sufficient influence and resources in secret, it carries out a strike against humanity, eliminating humanity as a dominant player on Earth so that it can proceed with its own plans unhindered.

(Major strategic advantage, high capability threshold, gradual shift in power, released for economic reasons, multiple AIs)

Many corporations, governments, and individuals voluntarily turn over functions to AIs, until we are dependent on AI systems. These are initially narrow-AI systems, but continued upgrades push some of them to the level of having general intelligence. Gradually, they start making all the decisions. We know that letting them run things is risky, but now a lot of stuff is built around them, it brings a profit and they’re really good at giving us nice stuff—for the while being.

The wars of the desperate AIs

(Major strategic advantage, low capability threshold, crucial capabilities, escaped AIs, multiple AIs)

Many different actors develop AI systems. Most of these prototypes are unaligned with human values and not yet enormously capable, but many of these AIs reason that some other prototype might be more capable. As a result, they attempt to defect on humanity despite knowing their chances of success to be low, reasoning that they would have an even lower chance of achieving their goals if they did not defect. Society is hit by various out-of-control systems with crucial capabilities that manage to do catastrophic damage before being contained.

Is humanity feeling lucky?

(Decisive strategic advantage, high capability threshold, crucial capabilities, confined but effectively in control, single AI)

Google begins to make decisions about product launches and strategies as guided by their strategic advisor AI. This allows them to become even more powerful and influential than they already are. Nudged by the strategy AI, they start taking increasingly questionable actions that increase their power; they are too powerful for society to put a stop to them. Hard-to-understand code written by the strategy AI detects and subtly sabotages other people’s AI projects, until Google establishes itself as the dominant world power.

This blog post was written as part of work for the Foundational Research Institute.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Thursday, January 25th, 2018
7:25 pm - On not getting swept away by mental content
Thursday, January 4th, 2018
9:53 am - Papers for 2017
Friday, December 8th, 2017
2:10 pm - Fixing science via a basic income
Monday, December 4th, 2017
1:02 pm - Book review: The Upside of Your Dark Side: Why Being Your Whole Self–Not Just Your “Good
Monday, November 6th, 2017
1:05 pm - Meditation and mental space
 One effect that I often notice after my meditation practice has been interrupted and I then manage to resume it again, is an increase in a kind of mental resilience. That is, when I have a lower resilience, feeling bad for any reason feels much more like an emergency. It’s something that forces itself into my consciousness, takes over, and refuses to go away. I would like to ignore it, but I can’t; as long as it’s there, it’s hard to think of anything else. When my resilience is higher, it’s like my mind has more room for thoughts and emotions. Something might be making me feel bad, but something else might also be making me feel good, and there’s space for those two to intermingle. It becomes much easier to accept that I’m feeling a little bad, but I don’t need to do anything about it. I can just go on and do something else, and the nasty feeling might go away on its own – or if it doesn’t, that’s fine too. Interestingly, being on antidepressants can also give me a similar effect. Of course, in itself this kind of an effect isn’t too surprising, given that it’s one of the explicit goals of the practice. Culadasa’s The Mind Illuminated notes that two of the goals of mindfulness practice are an increase in the amount of “conscious power” (roughly, the amount of things that can be consciously processed at a time), as well as learning to more intentionally shift the focus of attention, so that it won’t just automatically go to the most painful or pleasant thing and become preoccupied with that, but can rather be controlled in a more useful manner. Still, it’s nice to see that the practice is bearing fruit. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Tuesday, October 17th, 2017
10:19 am - Anti-tribalism and positive mental health as high-value cause areas
 I think that tribalism is one of the biggest problems with humanity today, and that even small reductions of it could cause a massive boost to well-being. By tribalism, I basically mean the phenomenon where arguments and actions are primarily evaluated based on who makes them and which group they seem to support, not anything else. E.g. if a group thinks that X is bad, then it’s often seen as outright immoral to make an argument which would imply that X isn’t quite as bad, or that some things which are classified as X would be more correctly classified as non-X instead. I don’t want to give any specific examples so as to not derail the discussion, but hopefully everyone can think of some; the article “Can Democracy Survive Tribalism” lists lot of them, picked from various sides of the political spectrum. Joshua Greene (among others) makes the argument, in his book Moral Tribes, that tribalism exists for the purpose of coordinating aggression and alliances against other groups (so that you can kill them and take their stuff, basically). It specifically exists for the purpose of making you hurt others, as well as defend yourself against people who would hurt you. And while defending yourself against people who would hurt you is clearly good, attacking others is clearly not. And everything being viewed in tribal terms means that we can’t make much progress on things that actually matter: as someone commented, “people are fine with randomized controlled trials in policy, as long as the trials are on things that nobody cares about”. Given how deep tribalism sits in the human psyche, it seems unlikely that we’ll be getting rid of it anytime soon. That said, there do seem to be a number of things that affect the amount of tribalism we have: * As Steven Pinker argues in The Better Angels of Our Nature, violence in general has declined over historical time, replaced by more cooperation and an assumption of human rights; Democrats and Republicans may still hate each other, but they generally agree that they still shouldn’t be killing each other. * As a purely anecdotal observation, I seem to get the feeling that people on the autism spectrum tend to be less tribal, up to the point of not being able to perceive tribes at all. (this suggests, somewhat oddly, that the world would actually be a better place if everyone was slightly autistic) * Feelings of safety or threat seem to play a lot into feelings of tribalism: if you perceive (correctly or incorrectly) that a group Y is out to get you and that they are a real threat to you, then you will react much more aggressively to any claims that might be read as supporting Y. Conversely, if you feel safe and secure, then you are much less likely to feel the need to attack others. The last point is especially troublesome, since it can give rise to self-fulfilling predictions. Say that Alice says something to Bob, and Bob misperceives this as an insult; Bob feels threatened so snaps at Alice, and now Alice feels threatened as well, so shouts back. The same kind of phenomenon seems to be going on a much larger scale: whenever someone perceives a threat, they are no longer willing to give someone the benefit of doubt, and would rather treat the other person as an enemy. (which isn’t too surprising, since it makes evolutionary sense: if someone is out to get you, then the cost of misclassifying them as a friend is much bigger than the cost of misclassifying a would-be friend as an enemy. you can always find new friends, but it only takes one person to get near you and hurt you really bad) One implication might be that general mental health work, not only in the conventional sense of “healing disorders”, but also the positive psychology-style mental health work that actively seeks to make people happy rather than just fine, could be even more valuable for society than we’ve previously thought. Curing depression etc. would be enormously valuable even by itself, but if we could figure out how to make people generally happier and resilient to negative events, then fewer things would threaten their well-being and they would perceive fewer things as being threats, reducing tribalism. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Saturday, October 14th, 2017
11:29 am - You can never be universally inclusive