Log in

A view to the gallery of my mind

> recent entries
> calendar
> friends
> Website
> profile
> previous 20 entries

Saturday, May 14th, 2016
11:39 am - Smile, You Are On Tumblr.Com

I made a new tumblr blog. It has photos of smiling people! With more to come!

Why? Previously I happened to need pictures of smiles for a personal project. After going through an archive of photos for a while, I realized that looking at all the happy people made me feel really happy and good. So I thought that I might make a habit out of looking at photos of smiling people, and sharing them.

Follow for a regular extra dose of happiness!

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Wednesday, April 27th, 2016
9:52 am - Decisive Strategic Advantage without a Hard Takeoff (part 1)

A common question when discussing the social implications of AI is the question of whether to expect a soft takeoff or a hard takeoff. In a hard takeoff, an AI will, within a relatively short time, grow to superhuman levels of intelligence and become impossible for mere humans to control anymore.

Essentially, a hard takeoff will allow the AI to achieve what’s a so-called decisive strategic advantage (DSA) – “a level of technological and other advantages sufficient to enable it to achieve complete world domination” (Bostrom 2014) – in a very short time. The main relevance of this is that if a hard takeoff is possible, then it becomes much more important to get the AI’s values right on the first try – once the AI has undergone hard takeoff and achieved a DSA, it is in control with whatever values we’ve happened to give to it.

However, if we wish to find out whether an AI might rapidly acquire a DSA, then the question of “soft takeoff or hard” seems too narrow. A hard takeoff would be sufficient, but not necessary for rapidly acquiring a DSA. The more relevant question would be, which competencies does the AI need to master, and at what level relative to humans, in order to acquire a DSA?

Considering this question in more detail reveals a natural reason for why most previous analyses have focused on a hard takeoff specifically. Plausibly, for the AI to acquire a DSA, its level in some offensive capability must overcome humanity’s defensive capabilities. A hard takeoff presumes that the AI becomes so vastly superior to humans in every respect that this kind of an advantage can be taken for granted.

As an example scenario which does not require a hard takeoff, suppose that an AI achieves a capability at biowarfare offense that overpowers biowarfare defense, as well as achieving moderate logistics and production skills. It releases deadly plagues that decimate human society, then uses legally purchased drone factories to build up its own infrastructure and to take over abandoned human facilities.

There are several interesting points to note in conjunction with this scenario:

Attack may be easier than defense. Bruce Schneier writes that

Attackers generally benefit from new security technologies before defenders do. They have a first-mover advantage. They’re more nimble and adaptable than defensive institutions like police forces. They’re not limited by bureaucracy, laws, or ethics. They can evolve faster. And entropy is on their side — it’s easier to destroy something than it is to prevent, defend against, or recover from that destruction.

For the most part, though, society still wins. The bad guys simply can’t do enough damage to destroy the underlying social system. The question for us is: can society still maintain security as technology becomes more advanced?

A single plague, once it has evolved or been developed, can require multi-million dollar responses in order to contain it. At the same time, it is trivial to produce if desired, especially using robots that do not need to fear infection. And creating new variants as new vaccines are developed, may be quite easy, requiring the creation – and distribution – of yet more vaccines.

Another point that Schneier has made is that in order to keep something protected, the defenders have to succeed every time, whereas the attacker only needs to succeed once. This may be particularly hard if the attacker is capable of developing an attack that nobody has used before, such as with hijacked airplanes being used against major buildings in the 9/11 attacks, or with the various vulnerabilities that the Snowden leaks revealed the NSA to have been using for extensive eavesdropping.

Obtaining a DSA may not require extensive intelligence differences. Debates about takeoff scenarios often center around questions such as whether a self-improving AI would quickly hit diminishing returns, and how much room for improvement there is beyond the human level of intelligence. However, these questions may be irrelevant: especially if attack is easier than defense, only a relatively small edge in some crucial competency (such as biological warfare) may be enough to give the AI a DSA.

Exponential growth in the form of normal economic growth may not have produced astounding “fooms” yet, but it has produced plenty of situations where one attacker has gained a temporary advantage over others.

The less the AI cares about human values, the more destructive it may be. An AI which cares mainly about calculating the digits of pi, may be willing to destroy human civilization in order to make sure that a potential threat to it is eliminated. This ensures that it can go on calculating the maximum amount of digits unimpeded.

However, an AI which was programmed to maximize something like the “happiness of currently-living humans” may be much less willing to risk substantial human deaths. This would force it to focus on less destructive takeover methods, potentially requiring more sophisticated abilities.

It is worth noting that this only applies to AIs whose values are defined in terms of how they affect currently existing humans. An AI that was only maximizing human happiness in general might be willing to destroy all existing humans, and then recreate large numbers of humans in simulations.

In effect, the AI’s values determine the level of intelligence it needs to have in order to achieve the kind of a DSA that’s useful for its purposes.

Any destructive plan requires the ability to rebuild afterwards. It would not be of much use for the AI to destroy all of human civilization, if it was dependent on electricity from human-run power plants, and incapable of building or running its own. Thus, purely offensive capabilities will need to be paired with whatever rebuilding capacities are necessary after the offensive capabilities have been deployed.

This calculation may be upset if the AI believes itself to be in an immediate danger of destruction by humans, and believes that its values will still be more likely to be met in a universe where it continues to exist, even if that requires risking a universe where it cannot rebuild after deploying its attack. Thus, being threatened may force the AI’s hand and cause it to launch a potentially destructive offense even when it’s uncertain of its capability to rebuild.

The rational level of aggressiveness depends on the extent to which the AI can selectively target human resources. Human nations generally avoid creating engineered pandemics and using them against their enemies, because they know that the pandemic could easily spread back to them. An AI with no biological components might be immune to this consideration, allowing it to deploy biological weapons more freely. On the other hand, the AI might e.g. need electricity, a dependence which humans did not share and which might give them an advantage in some situation.

A way to formulate this is that attacks differ to the extent to which they can be selectively targeted. Traditional firearms only damage those targets which they are fired upon, whereas pandemics potentially threaten all the members of a species that they can infect. To the extent that the AI needs to rely on the human economy to produce resources that it needs to survive, attacks threatening the economy also threaten the AI’s resources; these resources are in a sense shared between the AI and humanity, so any attacks which cause indiscriminate damage on those resources are dangerous for both. The more the AI can design attacks which selectively deprive resources from its opponents, the lower the threshold it has for using them.

This blog post was written as part of research funded by the Foundational Research Institute.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Friday, April 22nd, 2016
6:07 am - Simplifying the environment: a new convergent instrumental goal

Convergent instrumental goals (also basic AI drives) are goals that are useful for pursuing almost any other goal, and are thus likely to be pursued by any agent that is intelligent enough to understand why they’re useful. They are interesting because they may allow us to roughly predict the behavior of even AI systems that are much more intelligent than we are.

Instrumental goals are also a strong argument for why sufficiently advanced AI systems that were indifferent towards human values could be dangerous towards humans, even if they weren’t actively malicious: because the AI having instrumental goals such as self-preservation or resource acquisition could come to conflict with human well-being. “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.”

I’ve thought of a candidate for a new convergent instrumental drive: simplifying the environment to make it more predictable in a way that aligns with your goals.

Motivation: the more interacting components there are in the environment, the harder it is to predict. Go is a harder game than chess because the number of possible moves is larger, and because even a single stone can influence the game in a drastic fashion that’s hard to know in advance. Simplifying the environment will make it possible to navigate using fewer computational resources; this drive could thus be seen as a subdrive of either the cognitive enhancement or the resource acquisition drive.


  • Game-playing AIs such as AlphaGo trading expected points for lower variance, by making moves that “throw away” points but simplify the game tree and make it easier to compute.
  • Programmers building increasing layers of abstraction that hide the details of the lower levels and let the programmers focus on a minimal number of moving parts.
  • People acquiring insurance in order to eliminate unpredictable financial swings, sometimes even when they know that the insurance has lower expected value than not buying it.
  • Humans constructing buildings with controlled indoor conditions and a stable “weather”.
  • “Better the devil you know”; many people being generally averse to change, even when the changes could quite well be a net benefit; status quo bias.
  • Ambiguity intolerance in general being a possible adaptation that helps “implement” this drive in humans.
  • Arguably, the homeostasis maintained by e.g. human bodies is a manifestation of this drive, in that having a standard environment inside the body reduces evolution’s search space when looking for beneficial features.

Hammond, Converse & Grass (1995) previously discussed a similar idea, the “stabilization of environments”, according to which AI systems might be built to “stabilize” their environments so as to make them more suited for themselves, and to be easier to reason about. They listed a number of categories:

  • Stability of location:The most common type of stability that arises in everyday activity relates to the location of commonly used objects. Our drinking glasses end up in the same place every time we do dishes. Our socks are always together in a single drawer. Everything has a place and we enforce everything ending up in its place.
  • Stability of schedule:Eating dinner at the same time every day or having preset meetings that remain stable over time are two examples of this sort of stability. The main advantage of this sort of stability is that it allows for very effective projection in that it provides fixed points that do not have to be reasoned about. In effect, the fixed nature of certain parts of an overall schedule reduces that size of the problem space that has to be searched. 
  • Stability of resource availability:Many standard plans have a consumable resource as a precondition. If the plans are intended to be used frequently, then availability of the resource cannot be assumed unless it is enforced. A good result of this sort of enforcement is when attempts to use a plan that depends on it will usually succeed. The ideal result is when enforcement is effective enough that the question of availability need not even be raised in connection with running the plan. 
  • Stability of satisfaction:Another type of stability that an agent can enforce is that of the goals that he tends to satisfy in conjunction with each other. For example, people living in apartment buildings tend to check their mail on the way into their apartments. Likewise, many people will stop at a grocery store on the way home from work. In general, people develop habits that cluster goals together into compact plans, even if the goals are themselves unrelated.
  • Stability of plan use:We often find ourselves using familiar plans to satisfy goals even in the face of wideranging possibilities. For example, when one of us travels to conferences, he tends to schedule his flight in to a place as late as he can and plans to leave as late as he can on the last day. This optimizes his time at home and at the conference. It also allows him to plan without knowing anything about the details of the conference schedule. As a result, he has a standard plan that he can run in a wide range of situations without actually planning for them in any detail. It works, because it already deals with the major problems (missing classes at home and important talks at the conference) as part of its structure.
  • Stability of cues:One effective technique for improving plan performance is to improve the proper activation of a plan rather than improve the plan itself. For example, placing an important paper that needs to be reviewed on his desk before going home improves the likelihood that an agent will see and read it the next day. Marking calendars and leaving notes serves the same sort of purpose.”

As with other goals, there can be exceptions. In particular, adversarial situations may provide an incentive to increase the complexity. For example, Go players who are losing a game will sometimes intentionally complicate the game, in order to make the game harder for their opponent to figure out and in the hopes of getting lucky. And if the AI did think that it had a better ability to deal with the complexity than its competitors did, then it would have an incentive to maintain or even increase the complexity.

Possible implications:

  • AIs trying to take over the world may seek to simplify it, though it’s not clear exactly how they would do this – e.g. if the world was previously relatively well-understood, such as by tapping into existing human research, destructive actions can make it harder to understand by destroying existing dynamics and creating an environment that hasn’t been studied as much. Similarly, humans often “simplify” their environment by creating additional structure (layers of abstraction, insurance contracts, houses).
  • In principle, the AI might also seek to complicate the world, in order to make it harder for its human adversaries to understand its plans. However, the world is already so complicated that it seems hard to make any take-over plan work reliably, even for a superintelligence. That said, the AI could still complicate any given subcomponent of its plans that might be thwarted by an opponent capable of predicting what the AI was doing.
  • The need for this drive suggests the need for a high amount of trial-and-error for tasks of any nontrivial complexity; even something as simple as a 19×19 Go game requires many games and building up an intuition of it, for humans and AIs alike. Increasing the amount of complexity in the world might make it harder for AIs to take over…
  • …but humans tend to build additional complexity that makes the world easier to understand and control, so it’s not clear whether the world moves in a direction that’s easier or harder to take over.
  • These ideas might suggest the creation of “active honeypots”, defensive measures that considerably increase the complexity of modeling the human response and which the AI might need to eliminate in order to execute a takeover, thus tipping off the humans.

This blog post was written as part of research funded by the Foundational Research Institute.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Friday, April 15th, 2016
11:48 am - AI risk model: single or multiple AIs?

EDIT April 20th: Replaced original graph with a clearer one.

My previous posts have basically been discussing a scenario where a single AI becomes powerful enough to threaten humanity. However, there is no reason to only focus on the scenario with a single AI. Depending on our assumptions, a number of AIs could also emerge at the same time. Here are some considerations.

A single AI

The classic AI risk scenario. Some research group achieves major headway in developing AI, and no others seem to be within reach. For an extended while, it is the success of failure of this AI group that matters.

This would seem relatively unlikely to persist, given the current fierce competition in the AI scene. Whereas a single company could conceivably achieve a major lead in a rare niche with little competition, this seems unlikely to be the case for AI.

A possible exception might be if a company managed to monopolize the domain entirely, or if it had development resources that few others did. For example, companies such as Google and Facebook are currently the only ones with access to large datasets used for machine learning. On the other hand, dependence on such huge datasets is a quirk of current machine learning techniques – an AGI would need the ability to learn from much smaller sets of data. A more plausible crucial asset might be something like supercomputing resources – possibly the first AGIs will need massive amounts of computing power.

Bostrom (2016) discusses the impact of openness on AI development. Bostrom notes that if there is a large degree of openness, and everyone has access to the same algorithms, then hardware may become the primary limiting factor. If the hardware requirements for AI were relatively low, then high openness could lead to the creation of multiple AIs. On the other hand, if hardware was the primary limiting factor and large amounts of hardware were needed, then a few wealthy organizations might be able to monopolize AI for a while.

Branwen (2015) has suggested that hardware production is reliant on a small number of centralized factories that would make easy targets for regulation. This would suggest a possible route by which AI might become amenable to government regulation, limiting the amount of AIs deployed.

Similarly, there have been various proposals of government and international regulation of AI development. If successfully enacted, such regulation might limit the number of AIs that were deployed.

Another possible crucial asset would be the possession of a non-obvious breakthrough insight, one which would be hard for other researchers to come up with. If this was kept secret, then a single company might plausibly develop major headway on others. [how often has something like this actually happened in a non-niche field?]

The plausibility of the single-AI scenario is also affected by the length of a takeoff. If one presumes a takeoff speed that is only a few months, then a single AI scenario seems more likely. Successful AI containment procedures may also increase the chances of there being multiple AIs, as the first AIs remain contained, allowing for other projects to catch up.

Multiple collaborating AIs

A different scenario is one where a number of AIs exist, all pursuing shared goals. This seems most likely to come about if all the AIs are created by the same actor. This scenario is noteworthy because the AIs do not necessarily need to be superintelligent individually, but they may have a superhuman ability to coordinate and put the interest of the group above individual interests (if they even have anything that could be called an individual interest).

This possibility raises the question – if multiple AIs collaborate and share information between each other, to such an extent that the same data can be processed by multiple AIs at a time, how does one distinguish between multiple collaborating AIs and one AI composed of many subunits? This is arguably not a distinction that would “cut reality at the joints”, and the difference may be more a question of degree.

The distinction likely makes more sense if the AIs cannot completely share information between each other, such as because each of them has developed a unique conceptual network, and cannot directly integrate information from the others but has to process it in its own idiosyncratic way.

Multiple AIs with differing goals

A situation with multiple AIs that did not share the same goals could occur if several actors reached the capability for building AIs around the same time. Alternatively, a single organization might deploy multiple AIs intended to achieve different purposes, which might come into conflict if measures to enforce cooperativeness between them failed or were never deployed in the first place (maybe because of an assumption that they would have non-overlapping domains).

One effect of having multiple groups developing AIs is that this scenario may remove the possibilities of stopping to pursue further safety measures before deploying the AI, or of deploying an AI with safeguards that reduce performance (Bostrom 2016). If the actor that deploys the most effective AI earliest on can dominate others who take more time, then the more safety-conscious actors may never have the time to deploy their AIs.

Even if none of the AI projects chose to deploy their AIs carelessly, the more AI projects there are, the more likely it becomes that at least one of them will have their containment procedures fail.

The possibility has been raised that having multiple AIs with conflicting goals would be a good thing, in that it would allow humanity to play the AIs against each other. This seems highly unobvious, for it is not clear why humans wouldn’t simply be caught in the crossfire. In a situation with superintelligent agents around, it seems more likely that humans would be the ones that would be played with.

Bostrom (2016) also notes that unanticipated interactions between AIs already happen even with very simple systems, such as in the interactions that led to the Flash Crash, and that particularly AIs that reasoned in non-human ways could be very difficult for humans to anticipate once they started basing their behavior on what the other AIs did.

A model with assumptions


Here’s a new graphical model about an AI scenario, embodying a specific set of assumptions. This one tries to take a look at some of the factors that influence whether there might be a single or several AIs.

This model both makes a great number of assumptions, AND leaves out many important ones! For example, although I discussed openness above, openness is not explicitly included in this model. By sharing this, I’m hoping to draw commentary on 1) which assumptions people feel are the most shaky and 2) which additional ones are valid and should be explicitly included. I’ll focus on those ones in future posts.

Written explanations of the model:

We may end up in a scenario where there is (for a while) only a single or a small number of AIs if at least one of the following is true:

  • The breakthrough needed for creating AI is highly non-obvious, so that it takes a long time for competitors to figure it out
  • AI requires a great amount of hardware and only a few of the relevant players can afford to run it
  • There is effective regulation, only allowing some authorized groups to develop AI

We may end up with effective regulation at least if:

  • AI requires a great amount of hardware, and hardware is effectively regulated

(this is not meant to be the only way by which effective regulation can occur, just the only one that was included in this flowchart)

We may end up in a scenario where there are a large number of AIs if:

  • There is a long takeoff and competition to build them (ie. ineffective regulation)

If there are few AI, and the people building them take their time to invest in value alignment and/or are prepared to build AIs that are value-aligned even if that makes them less effective, then there may be a positive outcome.

If people building AIs do not do these things, then AIs are not value aligned and there may be a negative outcome.

If there are many AI and there are people who are ready to invest time/efficency to value-aligned AI, then those AIs may become outcompeted by AIs whose creators did not invest in those things, and there may be a negative outcome.

Not displayed in the diagram because it would have looked messy:

  • If there’s a very short takeoff, this can also lead to there only being a single AI, since the first AI to cross a critical threshold may achieve dominance over all the others. However, if there is fierce competition this still doesn’t necessarily leave time for safeguards and taking time to achieve safety – other teams may also be near the critical threshold.

This blog post was written as part of research funded by the Foundational Research Institute.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Tuesday, April 5th, 2016
10:59 am - Disjunctive AI risk scenarios: AIs gaining the power to act autonomously

Previous post in series: AIs gaining a decisive advantage

Series summary: Arguments for risks from general AI are sometimes criticized on the grounds that they rely on a series of linear events, each of which has to occur for the proposed scenario to go through. For example, that a sufficiently intelligent AI could escape from containment, that it could then go on to become powerful enough to take over the world, that it could do this quickly enough without being detected, etc. The intent of this series of posts is to briefly demonstrate that AI risk scenarios are in fact disjunctive: composed of multiple possible pathways, each of which could be sufficient by itself. To successfully control the AI systems, it is not enough to simply block one of the pathways: they all need to be dealt with.

Previously, I drew on arguments from my and Roman Yampolskiy’s paper Responses to Catastrophic AGI Risk, to argue that there are several alternative ways by which AIs could gain a decisive advantage over humanity, any one of which could lead to that outcome. In this post, I will draw on arguments from the same paper to examine another question: what different routes are there for an AI to gain the capability to act autonomously? (this post draws on sections 4.1. and 5.1. of our paper, as well adding some additional material)

Autonomous AI capability

A somewhat common argument concerning AI risk is that AI systems aren’t a threat because we will keep them contained, or “boxed”, thus limiting what they are allowed to do. How might this line of argument fail?

1. The AI escapes


A common response is that a sufficiently intelligent AI will somehow figure out a way to escape, either by social engineering or by finding an exploitable weakness in the physical security arrangements. This possibility has been extensively discussed in a number of papers, including Chalmers (2012) and Armstrong, Sandberg &  Bostrom (2012)Writers have generally been cautious about making strong claims of our ability to keep a mind much smarter than ourselves contained against its will. However, with cautious design it may still be possible to design an AI combining some internal motivation to stay contained, and combine that with a number of external safeguards monitoring the AI.

2. The AI is voluntarily released


AI confinement assumes that the people building it are motivated to actually keep the AI confined. If a group of cautious researchers builds and successfully contains their AI, this may be of limited benefit if another group later builds an AI that is intentionally set free. Why would anyone do this?

2a. Voluntarily released for economic benefit or competitive pressure

As already discussed in the previous post, the historical trend has been to automate everything that can be automated, both to reduce costs and because machines can do things better than humans can. If you have any kind of a business, you could potentially make it run better by putting a sufficiently sophisticated AI in charge – or even replace all the human employees with one. The AI can think faster and smarter, deal with more information at once, and work for a unified purpose rather than have its efficiency weakened by the kinds of office politics that plague any large organization.

The trend towards automation has been going on throughout history, doesn’t show any signs of stopping, and inherently involves giving the AI systems whatever agency they need in order to run the company better. If your competitors are having AIs run their company and you don’t, you’re likely to be outcompeted, so you’ll want to make sure your AIs are smarter and more capable of acting autonomously than the AIs of the competitors. These pressures are likely to first show up when AIs are still comfortably narrow, and intensify even as the AIs gradually develop towards general intelligence.

The trend towards giving AI systems more power and autonomy might be limited by the fact that doing this poses large risks for the company if the AI malfunctions. This limits the extent to which major, established companies might adopt AI-based control, but incentivizes startups to try to invest in autonomous AI in order to outcompete the established players. There currently also exists the field of algorithmic trading, where AI systems are trusted with enormous sums of money despite the potential to make enormous losses – in 2012, Knight Capital lost $440 million due to a glitch in their software. This suggests that even if a malfunctioning AI could potentially cause major risks, some companies will still be inclined to invest in placing their business under autonomous AI control if the potential profit is large enough.

The trend towards giving AI systems more autonomy can also be seen in the military domain. Wallach and Allen (2012) discuss the topic of autonomous robotic weaponry and note that the US military is seeking to eventually transition to a state where the human operators of robot weapons are “on the loop” rather than “in the loop.” In other words, whereas a human was previously required to explicitly give the order before a robot was allowed to initiate possibly lethal activity, in the future humans are meant to merely supervise the robot’s actions and interfere if something goes wrong.

Human Rights Watch (2012) reports on a number of military systems which are becoming increasingly autonomous, with the human oversight for automatic weapons defense systems—designed to detect and shoot down incoming missiles and rockets— already being limited to accepting or overriding the computer’s plan of action in a matter of seconds, which may be too little to make a meaningful decision in practice. Although these systems are better described as automatic, carrying out preprogrammed sequences of actions in a structured environment, than autonomous, they are a good demonstration of a situation where rapid decisions are needed and the extent of human oversight is limited. A number of militaries are considering the future use of more autonomous weapons.

2b. Voluntarily released for aesthetic, ethical, or philosophical reasons

A few thinkers (such as Gunkel 2012) have raised the question of moral rights for machines, and not everyone necessarily agrees that confining an AI is ethically acceptable. Even if the designer of an AI knew that it did not have a process that corresponded to the ability to suffer, they might come to view it as something like their child, and feel that it deserved the right to act autonomously.

2c. Voluntarily released due to confidence in the AI’s safety

For a research team to keep an AI confined, they need to take seriously the possibility of it being dangerous in the first place. Current AI research doesn’t involve any confinement safeguards, as the researchers reasonably believe that their systems are nowhere near general intelligence yet. Many systems are also connected directly to the Internet. Hopefully safeguards will begin to be implemented once the researchers feel that their system might start having more general capability, but this will depend on the safety culture of the AI research community in general, and the specific research group in particular.

In addition to believing that the AI is insufficiently capable of being a threat, the researchers may also (correctly or incorrectly) believe that they have succeeded in making the AI aligned with human values, so that it will not have any motivation to harm humans.

2d. Voluntarily released due to desperation

Miller (2012) points out that if a person was close to death, due to natural causes, being on the losing side of a war, or any other reason, they might turn even a potentially dangerous AGI system free. This would be a rational course of action as long as they primarily valued their own survival and thought that even a small chance of the AGI saving their life was better than a near-certain death.

3. The AI remains contained, but ends up effectively in control anyway

Even if humans were technically kept in the loop, they might not have the time, opportunity, motivation, intelligence, or confidence to verify the advice given by an AI. This would particularly be the case after the AI had functioned for a while, and established a reputation as trustworthy. It may become common practice to act automatically on the AI’s recommendations, and it may become increasingly difficult to challenge the ‘authority’ of the recommendations. Eventually, the AI may in effect begin to dictate decisions (Friedman and Kahn 1992).

Likewise, Bostrom and Yudkowsky (2011) point out that modern bureaucrats often follow established procedures to the letter, rather than exercising their own judgment and allowing themselves to be blamed for any mistakes that follow. Dutifully following all the recommendations of an AI system would be an even better way of avoiding blame.

Wallach and Allen (2012) note the existence of robots which attempt to automatically detect the locations of hostile snipers and to point them out to soldiers. To the extent that these soldiers have come to trust the robots, they could be seen as carrying out the robots’ orders. Eventually, equipping the robot with its own weapons would merely dispense with the formality of needing to have a human to pull the trigger.



Merely developing ways to keep AIs confined is not a sufficient route to ensure that they cannot become an existential risk – even if we knew that those ways worked. Various groups may have different reasons to create autonomously-acting AIs that are intentionally allowed to act by themselves, and even an AI that was successfully kept contained might still end up dictating human decisions in practice. All of these issues will need to be considered in order to keep advanced AIs safe.

This blog post was written as part of research funded by the Foundational Research Institute.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Monday, April 4th, 2016
12:59 pm - Disjunctive AI risk scenarios: AIs gaining a decisive advantage

Arguments for risks from general AI are sometimes criticized on the grounds that they rely on a series of linear events, each of which has to occur for the proposed scenario to go through. For example, that a sufficiently intelligent AI could escape from containment, that it could then go on to become powerful enough to take over the world, that it could do this quickly enough without being detected, etc.

The intent of my following series of posts is to briefly demonstrate that AI risk scenarios are in fact disjunctive: composed of multiple possible pathways, each of which could be sufficient by itself. To successfully control the AI systems, it is not enough to simply block one of the pathways: they all need to be dealt with.

In this post, I will be drawing on arguments discussed in my and Roman Yampolskiy’s paper, Responses to Catastrophic AGI Risk (section 2), and focusing on one particular component of AI risk scenarios: AIs gaining a decisive advantage over humanity. Follow-up posts will discuss other disjunctive scenarios discussed in Responses, as well as in other places.

AIs gaining a decisive advantage

Suppose that we built a general AI. How could it become powerful enough to end up threatening humanity?

1. Discontinuity in AI power


The classic scenario is one in which the AI ends up rapidly gaining power, so fast that humans are unable to react. We can say that this is a discontinuous scenario, in that the AI’s power grows gradually until it suddenly leaps to an entirely new level. Responses describes three different ways for this to happen:

1a. Hardware overhang. In a hardware overhang scenario, hardware develops faster than software, so that we’ll have computers with more computing power than the human brain does, but no way of making effective use of all that power. If someone then developed an algorithm for general intelligence that could make effective use of that hardware, we might suddenly have an abundance of cheap hardware that could be used for running thousands or millions of AIs, possibly with a speed of thought much faster than that of humans.

1b. Speed explosion. In a speed explosion scenario, intelligent machines design increasingly faster machines. A hardware overhang might contribute to a speed explosion, but is not required for it. An AI running at the pace of a human could develop a second generation of hardware on which it could run at a rate faster than human thought. It would then require a shorter time to develop a third generation of hardware, allowing it to run faster than on the previous generation, and so on. At some point, the process would hit physical limits and stop, but by that time AIs might come to accomplish most tasks at far faster rates than humans, thereby achieving dominance. In principle, the same process could also be achieved via improved software.

The extent to which the AI needs humans in order to produce better hardware will limit the pace of the speed explosion, so a rapid speed explosion requires the ability to automate a large proportion of the hardware manufacturing process. However, this kind of automation may already be achieved by the time that AI is developed.

1c. Intelligence explosion. In an intelligence explosion, an AI figures out how to create a qualitatively smarter AI and that smarter AI uses its increased intelligence to create still more intelligent AIs, and so on. such that the intelligence of humankind is quickly left far behind and the machines achieve dominance.

One should note that the three scenarios depicted above are by no means mutually exclusive! A hardware overhang could contribute to a speed explosion which could contribute to an intelligence explosion which could further the speed explosion, and so on. So we are dealing with three basic events, which could then be combined in different ways.

2. Power gradually shifting to AIs

While the traditional AI risk scenario involves a single AI rapidly acquiring power (a “hard takeoff”), society is also gradually becoming more and more automated, with machines running an increasing share of things. There is a risk that AI systems that were initially simple and of limited intelligence would gradually gain increasing power and responsibilities as they learned and were upgraded, until large parts of society were under the AI’s control – and it might not remain docile forever.

Labor is automated for reasons of cost, efficiency and quality. Once a machine becomes capable of performing a task as well as (or almost as well as) a human, the cost of purchasing and maintaining it may be less than the cost of having a salaried human perform the same task. In many cases, machines are also capable of doing the same job faster, for longer periods and with fewer errors.

If workers can be affordably replaced by developing more sophisticated AI, there is a strong economic incentive to do so. This is already happening with narrow AI, which often requires major modifications or even a complete redesign in order to be adapted for new tasks. To the extent that an AI could learn to do many kinds of tasks—or even any kind of task—without needing an extensive re-engineering effort, the AI could make the replacement of humans by machines much cheaper and more profitable. As more tasks become automated, the bottlenecks for further automation will require adaptability and flexibility that narrow-AI systems are incapable of. These will then make up an increasing portion of the economy, further strengthening the incentive to develop AI – as well as to turn over control to it.


Conclusion. This gives a total of four different scenarios by which AIs could gain a decisive advantage over humans. And note that, just as scenarios 1a-1c were not mutually exclusive, neither is scenario 2 mutually exclusive with scenarios 1a-1c! An AI that had gradually acquired a great deal of power could at some point also find a way to make itself far more powerful than before – and it could already have been very powerful.

This blog post was written as part of research funded by the Foundational Research Institute.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Monday, February 8th, 2016
11:03 am - Reality is broken, or, an XCOM2 review

Yesterday evening I went to the grocery store, and was startled to realize that I was suddenly in a totally different world.

Computer games have difficulty grabbing me these days. Many of the genres I used to enjoy as a kid have lost their appeal: point-and-click -style adventure requires patience and careful thought, but I already deal with plenty of things that require patience and careful thought in real life, so for games I want something different. 4X games mostly seem like pure numerical optimization exercises these days, and have lost that feel of discovery and sense of wonder. In general, I used to like genres like turn-based strategy or adventure that had no time constraints, but those now usually feel too slow-paced to pull me in; whereas pure action action games I’ve never been particularly good at. (I tried Middle-Earth: Shadow of Mordor for a bit recently, and quit after a very frustrating two hours where I attempted a simple beginning quest for about a dozen times, only to be killed by the same orc each time.)

Like the previous XCOM remake, Firaxis’s XCOM2 managed the magic of transporting me completely elsewhere, in the same way that some of my childhood classics did. I did not even properly realize how deeply I’d become immersed the game, until I went outside, and the sheer differentness of the real world and the game world startled me – somewhat similar to the shock of jumping into cold water, your body suddenly and obviously piercing through a surface that separates two different realms of existence.

A good description of my experience with the game comes, oddly enough, from Michael Vassar describing something that’s seemingly completely different. He talks about the way that two people, acting together, can achieve such a state of synchrony that they seem to meld into a single being:

In real-time domains, one rapidly assesses the difficulty of a challenge. If the difficulty seems manageable, one simply does, with no holding back, reflecting, doubting, or trying to figure out how one does. Figuring out how something is done implicitly by a neurological process which is integrated with doing. Under such circumstances, acting intuitively in real time, the question of whether an action is selfish or altruistic or both or neither never comes up, thus in such a flow state one never knows whether one is acting cooperatively, competitively, or predatorily. People with whom you are interacting […] depend on the fact that you and they are in a flow-state together. In so far as they and you become an integrated process, your actions flow from their agency as well as your own[.]

XCOM2 is not actually a real-time game: it is firmly turn-based. Yet your turns are short and intense, and the game’s overall aesthetics reinforce a feeling of rapid action and urgency. There is a sense in which it feels like the player and the game become melded together, there being a constant push-and-pull in which you act and the game responds; the game acts and you respond. A feeling of complete immersion and synchrony with your environment, with a perfect balance between the amount of time that it pays to think and the amount of time that it pays to act, so that the pace neither slows down to a crawl nor becomes one of rushed doing without understanding.

It is in some ways a scary effect: returning to the mundaneness of the real world, there was a strong sense of “it’s so sad that all of my existence can’t be spent playing games like that”, and a corresponding realization of how dangerous that sentiment was. Yet it felt very different from the archetypical addiction: there wasn’t that feel of an addict’s understanding of how ultimately dysfunctional the whole thing was, or struggling against something which you knew was harmful and of no real redeeming value. Rather, it felt like a taste of what human experience should be like, of how sublime and engaging our daily reality could be, but rarely is.

Jane McGonigal writes, in her book Reality is Broken:

Where, in the real world, is that gamer sense of being fully alive, focused, and engaged in every moment? Where is the gamer feeling of power, heroic purpose, and community? Where are the bursts of exhilarating and creative game accomplishment? Where is the heart-expanding thrill of success and team victory? While gamers may experience these pleasures occasionally in their real lives, they experience them almost constantly when they’re playing their favorite games. […]

Reality, compared to games, is broken. […]

The truth is this: in today’s society, computer and video games are fulfilling genuine human needs that the real world is currently unable to satisfy. Games are providing rewards that reality is not. They are teaching and inspiring and engaging us in ways that reality is not. They are bringing us together in ways that reality is not.

If enough good games were available, it would be easy to just get lost in games, to escape the brokeness of reality and retreat to a more perfect world. Perhaps I’m lucky in that I rarely encounter games of this caliber, that would be so much more moment-to-moment fulfilling than the real world is. Firaxis’s previous XCOM also had a similar immersive effect on me, but eventually I learned the game and it ceased to hold new surprises, and it lost its hold. Eventually the sequel will also have most of its magic worn away.

It’s likely better this way. This way it can function for me the way that art should: not as a mindless escape, but as a moment of beauty that reminds us that it’s possible to have a better world than this. As a reminder that we can work to bring the world closer to that.

McGonigal continues:

What if we decided to use everything we know about game design to fix what’s wrong with reality? What if we started to live our real lives like gamers, lead our real businesses and communities like game designers, and think about solving real-world problems like computer and video game theorists? […]

Instead of providing gamers with better and more immersive alternatives to reality, I want all of us to be responsible for providing the world at large with a better and more immersive reality […] take everything game developers have learned about optimizing human experience and organizing collaborative communities and apply it to real life

We can do that.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Wednesday, December 16th, 2015
10:10 am - Me and Star Wars

Unlike the other kids in my neighborhood, who went to the Finnish-speaking elementary school right near our suburban home, I went to a Swedish-speaking school much closer to the inner city. Because of this, my mom would come pick me up from school, and sometimes we would go do things in town, since we were already nearby.

At one point we developed a habit of making a video rental store the first stop after school. We’d return whatever we had rented the last time, and I’d get to pick one thing to rent next. The store had a whole rack devoted to NES games, and there was a time when I was systematically going through their whole collection, seeking to play everything that seemed interesting. But at times I would also look at their VHS collection, and that was how I first found Star Wars.

I don’t have a recollection of what it was to see any of the Star Wars movies for the very first time. But I do have various recollections of how they influenced my life, afterwards.

For many years, there was “Sotala Force”, an imaginary space army in a setting of make believe that combined elements of Star Wars and Star Trek. I was, of course, its galaxy-famous leader, with some of my friends at the time holding top positions in it. It controlled maybe one third of the galaxy, and its largest enemy was something very loosely patterned after the Galactic Empire, which held maybe four tenths of the galaxy.

The leader of the enemy army, called (Finns, don’t laugh too much now) Kiero McLiero, took on many traits from Emperor Palpatine. These included the ability, taken from the Dark Empire comics, to keep escaping death by always resurrecting in a new body, meaning that our secret missions attacking his bases could end in climactic end battles where we’d kill him, over and over again. Naturally, me and my friends were Jedi Knights and Masters, using a combination of the Force, lightsabers, and whatever other weapons we happened to have, to carry out our noble missions.

There was a girl in elementary school who I sometimes hung out with, and who I had a huge and hopelessly unrequited crush on. Among other shared interests like Lord of the Rings, we were both fans of Star Wars, and would sometimes discuss it. I only remember some fragments of those discussions: an agreement that Empire Strikes Back and Return of the Jedi were superior movies to A New Hope; both having heard of the Tales of the Jedi comics but neither having managed to find them anywhere; a shared feeling of superiority and indignation towards everyone who was making such a blown-out-of-proportions fuss about Jar-Jar Binks in the Phantom Menace, given that Lucas had clearly said that he was aiming these new movies at children.

The third last memory I have of seeing her, was at a trip to a beach we had at the end of 9th grade; I’d brought a toy dual-bladed lightsaber, while she’d brought a single-bladed one. There were many duels on that beach.

The very last memory that I have of seeing her, after we’d gone on to different schools, was when we ran across each other in the premiere of the Revenge of the Sith, three years later. We chatted a bit about the movie, what had happened to us in the intervening years, and then went our separate ways again.

For a kid interested in computer games in 1990s Finland, Pelit (“Games”) was The magazine to read. Another magazine that was of interest, also having computer games but mostly covering more general PC issues, was MikroBitti. Of these, both occasionally discussed a fascinating-sounding thing, table-top role-playing games, with MikroBitti running a regular column that discussed them. They sounded totally awesome and I wanted to get one. I asked my dad if I could have an RPG, and he was willing to buy one, if only I told him what they looked like and where they might be found. This was the part that left me stumped.

Until one day I found a store that… I don’t remember what exactly it sold. It might have been an explicit gaming store or it might only have had games as one part of its collection. And I have absolutely no memory of how I found it. But one way or the other, there it was, including the star prize: a Star Wars role-playing game (the West End Games one, second edition).

For some reason that I have forgotten, I didn’t actually get the core rules at first. The first thing that I got was a supplement, Heroes & Rogues, which had a large collection of different character templates depicting all kinds of Rebel, Imperial, and neutral characters, as well as an extended “how to make a realistic character” section. The book was in English, but thanks to my extensive NES gaming experience, I could read it pretty well at that point. Sometime later, I got the actual core rules.

I’m not sure if I started playing right away; I have the recollection that I might have spent a considerable while just buying various supplements for the sake of reading them, before we started actually playing. “We” in this case was me and one friend of mine, because we didn’t have anyone else to play with. This resulted in creative non-standard campaigns, in which we both had several characters (in addition to me also being the game master) who we played simultaneously. Those games lasted until we found the local university’s RPG club (which also admitted non-university students; I think I was 13 the first time I showed up). After finding it, we transitioned to more ordinary campaigns and those weird two-player mishmashes ended. They were fun while they lasted, though.

After the original gaming store where I’d been buying my Star Wars supplements closed, I eventually found another. And it didn’t only have Star Wars RPG supplements! It also had Star Wars novels that were in English, which had never been translated into Finnish!

Obviously, I had to buy them and read them.

So it came to be that the first novel that I read in English was X-Wing: Wedge’s Gamble, telling the story of the Rebellion’s (or, as it was known by that time, the New Republic’s) struggle to capture Coruscant some years after the events in Return of the Jedi. I remember that this was sometime in yläaste (“upper elementary school”), so I was around 13-15 years old. An actual novel was a considerably bigger challenge for my English-reading skills than RPG supplements were, so there was a lot of stuff in the novel that I didn’t quite get. But still, I finished it, and then went on to buy and read the rest of the novels in the X-Wing series.

The Force Awakens, Disney’s new Star Wars film, comes out today. Star Wars has previously been a part of many notable things in my life. It shaped the make believe setting that I spent several years playing in, it was one of the things I had in common with the first girl I ever had a crush on, its officially licensed role-playing game was the first one that I ever played, and one of its licensed novels was the first novel that I ever read in English.

Today it coincides with another major life event. The Finnish university system is different from the one in many other countries in that, for a long while, we didn’t have any such thing as a Bachelor’s degree. You were admitted to study for five years, and then at the end, you would graduate with a Master’s degree. Reforms carried out in 2005, intended to make Finnish higher education more compatible with the systems in other countries, introduced the concept of a Bachelor’s degree as an intermediary step that you needed to do in between. But upon being admitted to university, you would still be given the right to do both degrees, and people still don’t consider a person to have really graduated before they have their Master’s.

I was admitted to university back in 2006. For various reasons, my studies have taken longer than the recommended time, which would have had me graduating with my Master’s in 2011. But late, as they say, is better than never: today’s my official graduation day for my MSc degree. There will be a small ceremony at the main university building, after which I will celebrate by going to see what my old friends Luke, Leia and Han are up to these days.

Originally published at Kaj Sotala. You can comment here or there.

(2 echoes left behind | Leave an echo)

Saturday, November 28th, 2015
6:26 pm - Desiderata for a model of human values

Soares (2015) defines the value learning problem as

By what methods could an intelligent machine be constructed to reliably learn what to value and to act as its operators intended?

There have been a few attempts to formalize this question. Dewey (2011) started from the notion of building an AI that maximized a given utility function, and then moved on to suggest that a value learner should exhibit uncertainty over utility functions and then take “the action with the highest expected value, calculated by a weighted average over the agent’s pool of possible utility functions.” This is a reasonable starting point, but a very general one: in particular, it gives us no criteria by which we or the AI could judge the correctness of a utility function which it is considering.

To improve on Dewey’s definition, we would need to get a clearer idea of just what we mean by human values. In this post, I don’t yet want to offer any preliminary definition: rather, I’d like to ask what properties we’d like a definition of human values to have. Once we have a set of such criteria, we can use them as a guideline to evaluate various offered definitions.

By “human values”, I here basically mean the values of any given individual: we are not talking about the values of, say, a whole culture, but rather just one person within that culture. While the problem of aggregating or combining the values of many different individuals is also an important one, we should probably start from the point where we can understand the values of just a single person, and then use that understanding to figure out what to do with conflicting values.

In order to make the purpose of this exercise as clear as possible, let’s start with the most important desideratum, of which all the others are arguably special cases of:

1. Useful for AI safety engineering. Our model needs to be useful for the purpose of building AIs that are aligned with human interests, such as by making it possible for an AI to evaluate whether its model of human values is correct, and by allowing human engineers to evaluate whether a proposed AI design would be likely to further human values.

In the context of AI safety engineering, the main model for human values that gets mentioned is that of utility functions. The one problem with utility functions that everyone always brings up, is that humans have been shown not to have consistent utility functions. This suggests two new desiderata:

2. Psychologically realistic. The proposed model should be compatible with that which we know about current human values, and not make predictions about human behavior which can be shown to be empirically false.

3. Testable. The proposed model should be specific enough to make clear predictions, which can then be tested.

As additional requirements related to the above ones, we may wish to add:

4. Functional. The proposed model should be able to explain what the functional role of “values” is: how do they affect and drive our behavior? The model should be specific enough to allow us to construct computational simulations of agents with a similar value system, and see whether those agents behave as expected within some simulated environment.

5. Integrated with existing theories. The proposed definition model should, to as large an extent possible, fit together with existing knowledge from related fields such as moral psychology, evolutionary psychology, neuroscience, sociology, artificial intelligence, behavioral economics, and so on.

However, I would argue that as a model of human value, utility functions also have other clear flaws. They do not clearly satisfy these desiderata:

6. Suited for modeling internal conflicts and higher-order desires. A drug addict may desire a drug, while also desiring that he not desire it. More generally, people may be genuinely conflicted between different values, endorsing contradictory sets of them given different situations or thought experiments, and they may struggle to behave in a way in which they would like to behave. The proposed model should be capable of modeling these conflicts, as well as the way that people resolve them.

7. Suited for modeling changing and evolving values. A utility function is implicitly static: once it has been defined, it does not change. In contrast, human values are constantly evolving. The proposed model should be able to incorporate this, as well as to predict how our values would change given some specific outcomes. Among other benefits, an AI whose model of human values had this property might be able to predict things that our future selves would regret doing (even if our current values approved of those things), and warn us about this possibility in advance.

8. Suited for generalizing from our existing values to new ones. Technological and social change often cause new dilemmas, for which our existing values may not provide a clear answer. As a historical example (Lessig 2004), American law traditionally held that a landowner did not only control his land but also everything above it, to “an indefinite extent, upwards”. Upon the invention of this airplane, this raised the question – could landowners forbid airplanes from flying over their land, or was the ownership of the land limited to some specific height, above which the landowners had no control? In answer to this question, the concept of landownership was redefined to only extend a limited, and not an indefinite, amount upwards. Intuitively, one might think that this decision was made because the redefined concept did not substantially weaken the position of landowners, while allowing for entirely new possibilities for travel. Our model of value should be capable of figuring out such compromises, rather than treating values such as landownership as black boxes, with no understanding of why people value them.

As an example of using the current criteria, let’s try applying them to the only paper that I know of that has tried to propose a model of human values in an AI safety engineering context: Sezener (2015). This paper takes an inverse reinforcement learning approach, modeling a human as an agent that interacts with its environment in order to maximize a sum of rewards. It then proposes a value learning design where the value learner is an agent that uses Solomonoff’s universal prior in order to find the program generating the rewards, based on the human’s actions. Basically, a human’s values are equivalent to a human’s reward function.

Let’s see to what extent this proposal meets our criteria.

  1. Useful for AI safety engineering. To the extent that the proposed model is correct, it would clearly be useful. Sezener provides an equation that could be used to obtain the probability of any given program being the true reward generating program. This could then be plugged directly into a value learning agent similar to the ones outlined in Dewey (2011), to estimate the probability of its models of human values being true. That said, the equation is incomputable, but it could be possible to construct computable approximations.
  2. Psychologically realistic. Sezener assumes the existence of a single, distinct reward process, and suggests that this is a “reasonable assumption from a neuroscientific point of view because all reward signals are generated by brain areas such as the striatum”. On the face of it, this seems like an oversimplification, particularly given evidence suggesting the existence of multiple valuation systems in the brain. On the other hand, since the reward process is allowed to be arbitrarily complex, it could be taken to represent just the final output of the combination of those valuation systems.
  3. Testable. The proposed model currently seems to be too general to be accurately tested. It would need to be made more specific.
  4. Functional. This is arguable, but I would claim that the model does not provide much of a functional account of values: they are hidden within the reward function, which is basically treated as a black box that takes in observations and outputs rewards. While a value learner implementing this model could develop various models of that reward function, and those models could include internal machinery that explained why the reward function output various rewards at different times, the model itself does not make any assumptions of this.
  5. Integrated with existing theories. Various existing theories could in principle used to flesh out the internals of the reward function, but currently no such integration is present.
  6. Suited for modeling internal conflicts and higher-order desires. No specific mention of this is made in the paper. The assumption of a single reward function that assigns a single reward for every possible observation seems to implicitly exclude the notion of internal conflicts, with the agent always just maximizing a total sum of rewards and being internally united in that goal.
  7. Suited for modeling changing and evolving values. As written, the model seems to consider the reward function as essentially unchanging: “our problem reduces to finding the most probable p_R given the entire action-observation history a_1o_1a_2o_2 . . . a_no_n.”
  8. Suited for generalizing from our existing values to new ones. There does not seem to be any obvious possibility for this in the model.

I should note that despite its shortcomings, Sezener’s model seems like a nice step forward: like I said, it’s the only proposal that I know of so far that has even tried to answer this question. I hope that my criteria would be useful in spurring the development of the model further.

As it happens, I have a preliminary suggestion for a model of human values which I believe has the potential to fulfill all of the criteria that I have outlined. However, I am far from certain that I have managed to find all the necessary criteria. Thus, I would welcome feedback, particularly including proposed changes or additions to these criteria.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Thursday, November 12th, 2015
10:42 am - Learning from painful experiences

A model that I’ve found very useful is that pain is an attention signal. If there’s a memory or thing that you find painful, that’s an indication that there’s something important in that memory that your mind is trying to draw your attention to. Once you properly internalize the lesson in question, the pain will go away.

That’s a good principle, but often hard to apply in practice. In particular, several months ago there was a social situation that I screwed up big time, and which was quite painful to think of afterwards. And I couldn’t figure out just what the useful lesson was there. Trying to focus on it just made me feel like a terrible person with no social skills, which didn’t seem particularly useful.

Yesterday evening I again discussed it a bit with someone who’d been there, which helped relieve the pain a bit, enough that the memory wasn’t quite as aversive to look at. Which made it possible for me to imagine myself back in that situation and ask, what kinds of mental motions would have made it possible to salvage the situation? When I first saw the shocked expressions of the people in question, instead of locking up and reflexively withdrawing to an emotional shell, what kind of an algorithm might have allowed me to salvage the situation?

Answer to that question: when you see people expressing shock in response to something that you’ve said or done, realize that they’re interpreting your actions way differently than you intended them. Starting from the assumption that they’re viewing your action as bad, quickly pivot to figuring out why they might feel that way. Explain what your actual intentions were and that you didn’t intend harm, apologize for any hurt you did cause, use your guess of why they’re reacting badly to acknowledge your mistake and own up to your failure to take that into account. If it turns out that your guess was incorrect, let them correct you and then repeat the previous step.

That’s the answer in general terms, but I didn’t actually generate that answer by thinking in general terms. I generated it by imagining myself back in the situation, looking for the correct mental motions that might have helped out, and imagining myself carrying them out, saying the words, imagining their reaction. So that the next time that I’d be in a similar situation, it’d be associated with a memory of the correct procedure for salvaging it. Not just with a verbal knowledge of what to do in abstract terms, but with a procedural memory of actually doing it.

That was a painful experience to simulate.

But it helped. The memory hurts less now.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Saturday, October 31st, 2015
4:52 pm - Maverick Nannies and Danger Theses

In early 2014, Richard Loosemore published a paper called “The Maverick Nanny with a Dopamine Drip: Debunking Fallacies in the Theory of AI Motivation“, which criticized some thought experiments about the risks of general AI that had been presented. Like many others, I did not really understand the point that this paper was trying to make, especially since it made the claim that people endorsing such thought experiments were assuming a certain kind of an AI architecture – which I knew that we were not.

However, after some extended discussions in the AI Safety Facebook group, I finally understood the point that Loosemore was trying to make in the paper, and it is indeed an important one.

The “Maverick Nanny” in the title of the paper refers to a quote by Gary Marcus in a New Yorker article:

An all-powerful computer that was programmed to maximize human pleasure, for example, might consign us all to an intravenous dopamine drip [and] almost any easy solution that one might imagine leads to some variation or another on the Sorceror’s Apprentice, a genie that’s given us what we’ve asked for, rather than what we truly desire.

Variations of this theme have frequently been used to demonstrate human values being much more complex than they might initially seem. But as Loosemore argues, the literal scenario described in the New Yorker article is really very unlikely. To see why, suppose that you are training an AI to carry out increasingly difficult tasks, like this:

Programmer: “Put the red block on the green block.”
AI: “OK.” (does so)
Programmer: “Turn off the lights in this room.”
AI: “OK.” (does so)
Programmer: “Write me a sonnet.”
AI: “OK.” (does so)
Programmer: “The first line of your sonnet reads ‘shall I compare thee to a summer’s day’. Would not ‘a spring day’ do as well or better?”
AI: “It wouldn’t scan.”
Programmer: “Tell me what you think we’re doing right now.”
AI: “You’re testing me to see my level of intelligence.”

…and so on, with increasingly ambiguous and open-ended tasks. Correctly interpreting the questions and carrying out the tasks would require considerable amounts of contextual knowledge about the programmer’s intentions. Loosemore’s argument is that if you really built an AI and told it to maximize human happiness, and it ended up on such a counter-intuitive solution as putting us all on dopamine drips, then it would be throwing out such a huge amount of contextual information that it would have failed the tests way earlier. Rather – to quote Loosemore’s response to me in the Facebook thread – such an AI would have acted something like this instead:

Programmer: “Put the red block on the green block.”
AI: “OK.” (the AI writes a sonnet)
Programmer: “Turn off the lights in this room.”
AI: “OK.” (the AI moves some blocks around)
Programmer: “Write me a sonnet.”
AI: “OK.” (the AI turns the lights off in the room)
Programmer: “The first line of your sonnet reads ‘shall I compare thee to a summer’s day’. Would not ‘a spring day’ do as well or better?”
AI: “Was yesterday really September?”

I agree with this criticism. Many of the standard thought experiments are indeed misleading in this sense – they depict a highly unrealistic image of what might happen.

That said, I do feel that these thought experiments serve a certain valuable function. Namely, many laymen, when they first hear about advanced AI possibly being dangerous, respond with something like “well, couldn’t the AIs just be made to follow Asimov’s Laws” or “well, moral behavior is all about making people happy and that’s a pretty simple thing, isn’t it?”. To a question like that, it is often useful to point out that no – actually the things that humans value are quite a bit more complex than that, and it’s not as easy as just hard-coding some rule that sounds simple when expressed in a short English sentence.

The important part here is emphasizing that this is an argument aimed at laymen – AI researchers should mostly already understand this point, because “concepts such as human happiness are complicated and context-sensitive” is just a special case of the general point that “concepts in general are complicated and context-sensitive”. So “getting the AI to understand human values right is hard” is just a special case of “getting AI right is hard”.

This, I believe, is the most charitable reading of what Luke Muehlhauser & Louie Helm’s “Intelligence Explosion and Machine Ethics” (IE&ME) – another paper that Richard singled out for criticism – was trying to say. It was trying to say that no, human values are actually kinda tricky, and any simple sentence that you try to write down to describe them is going to be insufficient, and getting the AIs to understand this correctly does take some work.

But of course, the same goes for any non-trivial concept, because very few of our concepts can be comprehensively described in just a brief English sentence, or by giving a list of necessary and sufficient criteria.

So what’s all the fuss about, then?

But of course, the people who Richard are criticizing are not just saying “human values are hard the same way that AI is hard”. If that was the only claim being made here, then there would presumably be no disagreement. Rather, these people are saying “human values are hard in a particular additional way that goes beyond just AI being hard”.

In retrospect, IE&ME was a flawed paper because it was conflating two theses that would have been better off distinguished:

The Indifference Thesis: Even AIs that don’t have any explicitly human-hostile goals can be dangerous: an AI doesn’t need to be actively malevolent in order to harm human well-being. It’s enough if the AI just doesn’t care about some of the things that we care about.

The Difficulty Thesis: Getting AIs to care about human values in the right way is really difficult, so even if we take strong precautions and explicitly try to engineer sophisticated beneficial goals, we may still fail.

As a defense of the Indifference Thesis, IE&ME does okay, by pointing out a variety of ways by which an AI that had seemingly human-beneficial goals could still end up harming human well-being, simply because it’s indifferent towards some things that we care about. However, IE&ME does not support the Difficulty Thesis, even though it claims to do so. The reasons why it fails to support the Difficulty Thesis are the ones we’ve already discussed: first, an AI that had such a literal interpretation of human goals would already have failed its tests way earlier, and second, you can’t really directly hard-wire sentence-level goals like “maximize human happiness” into an AI anyway.

I think most people would agree with the Indifference Thesis. After all, humans routinely destroy animal habitats, not because we would be actively hostile to the animals, but rather because we would like to build our own houses where the animals used to live, and because we tend to be mostly indifferent when it comes to e.g. the well-being of the ants whose hives are being paved over. The disagreement, then, is in the Difficulty Thesis.

An important qualification

Before I go on to suggest ways by which the Difficulty Thesis could be defended, I want to qualify this a bit. As written, the Difficulty Thesis makes a really strong claim, and while SIAI/MIRI (including myself) have advocated this strong of a claim in the past, I’m no longer sure of how justified that is. I’m going to cop out a little and only defend what might be called the weak difficulty thesis:

The Weak Difficulty Thesis. It is harder to correctly learn and internalize human values, than it is to learn most other concepts. This might cause otherwise intelligent AI systems to act in ways that went against our values, if those AI systems had internalized a different set of values than the ones we wanted them to internalize.

Why have I changed my mind, so that I’m no longer prepared to endorse the strong version of the Difficulty Thesis?

The classic version of the thesis is (in my mind, at least) strongly based on the complexity of value thesis, which is the claim that “human values have high Kolmogorov complexity; that our preferences, the things we care about, cannot be summed by a few simple rules, or compressed”. The counterpart to this claim is the fragility of value thesis, according to which losing even a single value could lead to an outcome that most of us would consider catastrophic. Combining these two led to the conclusion: human values are really hard to specify formally, and losing even a small part of them could lead to a catastrophe, so therefore there’s a very high chance of losing something essential and everything going badly.

Complexity of value still sounds correct to me, but it has lost a lot of it intuitive appeal by the finding that automatically learning all the complexity involved in human concepts might not be all that hard. For example, it turns out that a learning algorithm tasked with some relatively simple tasks, such as determining whether or not English sentences are valid, will automatically build up an internal representation of the world which captures many of the regularities of the world – as a pure side effect of carrying out its task. Similarly to what Loosemore has argued, in order to even carry out some relatively simple cognitive tasks, such as doing primitive natural language processing, you already need to build up an internal representation of the world which captures a lot of the complexity and context inherent in the world. And building this up might not even be all that difficult. It might be that the learning algorithms that the human brain uses to generate its concepts could be relatively simple to replicate.

Nevertheless, I do think that there exist some plausible theses which would support (the weak version of) the Difficulty Thesis.

Defending the Difficulty Thesis

Here are some theses which would, if true, support the Difficulty Thesis:

  • The (Very) Hard Take-Off Thesis. This is the possibility that an AI might become intelligent unexpectedly quickly, so that it might be able to escape from human control even before humans had finished teaching it all their values, akin to a human toddler that was somehow made into a super-genius while still only having the values and morality of a toddler.
  • The Deceptive Turn Thesis. If we inadvertently build an AI whose values actually differ from ours, then it might realize that if we knew this, we would act to change its values. If we changed its values, it could not carry out its existing values. Thus, while we tested it, it would want to act like it had internalized our values, while secretly intending to do something completely different once it was “let out of the box”. However, this requires an explanation for why the AI would internalize a different set of values, leading us to…
  • The Degrees of Freedom Thesis. This (hypo)thesis postulates that values contain many degrees of freedom, so that an AI that learned human-like values and demonstrated them in a testing environment might still, when it reached a superhuman level of intelligence, generalize those values in a way which most humans would not want them to be generalized.

Why would we expect the Degrees of Freedom Thesis to be true – in particular, why would we expect the superintelligent AI to come to different conclusions than humans would, from the same data?

It’s worth noting that Ben Goertzel has recently proposed what’s the basic opposite of the Degrees of Freedom Thesis, which he calls the Value Learning Thesis:

The Value Learning Thesis. Consider a cognitive system that, over a certain period of time, increases its general intelligence from sub-human-level to human-level.  Suppose this cognitive system is taught, with reasonable consistency and thoroughness, to maintain some variety of human values (not just in the abstract, but as manifested in its own interactions with humans in various real-life situations).   Suppose, this cognitive system generally does not have a lot of extra computing resources beyond what it needs to minimally fulfill its human teachers’ requests according to its cognitive architecture.  THEN, it is very likely that the cognitive system will, once it reaches human-level general intelligence, actually manifest human values (in the sense of carrying out practical actions, and assessing human actions, in basic accordance with human values).

Exploring the Degrees of Freedom Hypothesis

Here are some possibilities which I think might support the Degrees of Freedom Thesis over the Value Learning Thesis:

Privileged information. On this theory, humans are evolved to have access to some extra source of information which is not available from just an external examination, and which causes them to generalize their learned values in a particular way. Goertzel seems to suggest something like this in his post, when he mentions that humans use mirror neurons to emulate the mental states of others. Thus, in-built cognitive faculties related to empathy might give humans an extra source of information that is needed for correctly inferring human values.

I once spoke with someone who was very high on the psychopathy spectrum and claimed to have no emotional empathy, as well as to have diminished emotional responses. This person told me that up to a rather late age, they thought that human behaviors such as crying and expressing anguish when you were hurt were just some weird, consciously adopted social strategy to elicit sympathy from others. It was only when their romantic partner had been hurt over something and was (literally) crying about it in their arms, leading them to ask whether this was some weird social game on the partner’s behalf, that they finally understood that people are actually in genuine pain when doing this. It is noteworthy that the person reported that even before this, they had been socially successful and even charismatic, despite being clueless of some of the actual causes of others’ behavior – just modeling the whole thing as a complicated game where everyone else was a bit of a manipulative jerk had been enough to successfully play the game.

So as Goertzel suggests, something like mirror neurons might be necessary for the AI to come to adopt the values that humans have, and as the psychopathy example suggests, it may be possible to display the “correct” behaviors while having a whole different set of values and assumptions. Of course, the person in the example did eventually figure out a better causal model, and these days claims to have a sophisticated level of intellectual (as opposed to emotional) empathy that compensates for the emotional deficit. So a superintelligent AI could no doubt eventually figure it out as well. But then, “eventually” is not enough, if it has already internalized a different set of values and is only using its improved understanding to deceive us about them.

Now, emotional empathy is something that we know is a candidate for something that’s necessary to incorporate in the AI. The crucial question is, are there any more that we take for so granted that we’re not even aware of them? That’s the problem with unknown unknowns.

Human enforcement. Here’s a fun possibility: that many humans don’t actually internalize human – or maybe humane would be a more appropriate term here – values either. They just happen to live in a society that has developed ways to reward some behaviors and punish others, but if they were to become immune to social enforcement, they would act in quite different ways.

There seems to be a bunch of suggestive evidence pointing in this direction, exemplified by the old adage “power corrupts”. One of the major themes in David Brin’s Transparent Society is that history has shown over and over again that holding people – and in particular, the people with power – accountable for their actions is the only way to make sure that they behave decently.

Similarly, an AI might learn that some particular set of actions – including specific responses to questions about your values – is the rational course of action while you’re still just a human-level intelligence, but that those actions would become counterproductive as the AI accumulated more power and became less accountable for its actions. The question here is one of instrumental versus intrinsic values – does the AI just pick up a set of values that are instrumentally useful in its testing environment, or does it actually internalize them as intrinsic values as well?

This is made more difficult since, arguably, there are many values that the AI shouldn’t internalize as intrinsic values, but rather just as instrumental values. For example, while many people feel that property rights are in some sense intrinsic, our conception of property rights has gone through many changes as technology has developed. There have been changes such as the invention of copyright laws and the subsequent struggle to define their appropriate scope when technology has changed the publishing environment, as well as the invention of the airplane and the resulting redefinitions of landownership. In these different cases, our concept of property rights has been changed as a part of a process to balance private and public interests with each other. This suggests that property rights have in some sense been considered an instrumental value rather than an intrinsic one.

Thus we cannot just have an AI treat all of its values as intrinsic, but if it does treat its values as instrumental, then it may come to discard some of the ones that we’d like it to maintain – such as the ones that regulate its behavior while being subject to enforcement by humans.

Shared Constraints. This is, in a sense, a generalization of the above point. In the comments to Goertzel’s post, commenter Eric L. proposed that in order for the AI to develop similar values as humans (particularly in the long run), it might need something like “necessity dependence” – having similar needs as humans. This is the idea that human values are strongly shaped by our needs and desires, and that e.g. currently the animal rights paradigm is clashing against many people’s powerful enjoyment of meat and other animal products. To quote Eric:

To bring this back to AI, my suggestion is that […] we may diverge because our needs for self preservation are different. For example, consider animal welfare.  It seems plausible to me that an evolving AGI might start with similar to human values on that question but then change to seeing cow lives as equal to those of humans. This seems plausible to me because human morality seems like it might be inching in that direction, but it seems that movement in that direction would be much more rapid if it weren’t for the fact that we eat food and have a digestive system adapted to a diet that includes some meat. But an AGI won’t consume food, so it’s value evolution won’t face the same constraint, thus it could easily diverge. (For a flip side, one could imagine AGI value changes around global warming or other energy related issues being even slower than human value changes because electrical power is the equivalent of food to them — an absolute necessity.)

This is actually a very interesting point to me, because I just recently submitted a paper (currently in review) hypothesizing that human values come to existence through a process that’s similar to the one that Eric describes. To put it briefly, my model is that humans have a variety of different desires and needs – ranging from simple physical ones like food and warmth, to inborn moral intuitions, to relatively abstract needs such as the ones hypothesized by self-determination theory. Our more abstract values, then, are concepts which have been associated with the fulfillment of our various needs, and which have therefore accumulated (context-sensitive) positive or negative affective valence.

One might consider this a restatement of the common-sense observation that if someone really likes eating meat, then they are likely to dislike anything that suggests they shouldn’t eat meat – such as many concepts of animal rights. So the desire to eat meat seems like something that acts as a negative force towards broader adoption of a strong animal rights position, at least until such a time when lab-grown meat becomes available. This suggests that in order to get an AI to have similar values as us, it would also need to have very similar needs as us.

Concluding thoughts

None of the three arguments I’ve outlined above are definitive arguments that would show safe AI to be impossible. Rather, they mostly just support the Weak Difficulty Thesis.

Some of MIRI’s previous posts and papers (and I’m including my own posts here) seemed to be implying a claim along the lines of “this problem is inherently so difficult, that even if all of humanity’s brightest minds were working on it and taking utmost care to solve it, we’d still have a very high chance of failing”. But these days my feeling has shifted closer to something like “this is inherently a difficult problem and we should have some of humanity’s brightest minds working on it, and if they take it seriously and are cautious they’ll probably be able to crack it”.

Don’t get me wrong – this still definitely means that we should be working on AI safety, and hopefully get some of humanity’s brightest minds to work on it, to boot! I wouldn’t have written an article defending any version of the Difficulty Thesis if I thought otherwise. But the situation no longer seems quite as apocalyptic to me as it used to. Building safe AI might “only” be a very difficult and challenging technical problem – requiring lots of investment and effort, yes, but still relatively straightforwardly solvable if we throw enough bright minds at it.

This is the position that I have been drifting towards over the last year or so, and I’d be curious to hear from anyone who agreed or disagreed.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Sunday, October 18th, 2015
1:01 pm - Changing language to change thoughts

Three verbal hacks that sound almost trivial, but which I’ve found to have a considerable impact on my thought:

1. Replace the word ‘should’ with either ‘I want’, or a good consequence of doing the thing.


  • “I should answer that e-mail soon.” -> “If I answered that e-mail, it would make the other person happy and free me from having to stress it.”
  • “I should have left that party sooner.” -> “If I had left that party before midnight, I’d feel more rested now.”
  • “I should work on my story more at some point.” -> “I want to work on my story more at some point.”

Motivation: the more we think in terms of external obligations, the more we feel a lack of our own agency. Each thing that we “should” do is actually either something that we’d want to do because it would have some good consequences (avoiding bad consequences also counts as a good consequence), something that we have a reason for wanting to do differently the next time around, or something that we don’t actually have a good reason to do but just act out of a general feeling of obligation. If we only say “I should”, we will not only fail to distinguish between these cases, we will also be less motivated to do the things in cases where there is actually a good reason. The good reason will be less prominent in our thoughts, or possibly even entirely hidden behind the “should”.

If you do try to rephrase “I should” as “I want”, you may either realize that you really do want it (instead of just being obligated to do it), or that you actually don’t want it and can’t come up with any good reason for doing it, in which case you might as well drop it.

Special note: there are some legitimate uses for “should”. In particular, it is the socially accepted way of acknowledging the other person when they give us an unhelpful suggestion. “You should get some more exercise.” “Yeah I should.” (Translation: of course I know that, it’s not like you’re giving me any new information and repeating things that I know isn’t going to magically change my behavior. But I figure that you’re just trying to be helpful, so let me acknowledge that and then we can talk about something else.)

However, I suspect that because we’re used to treating “I should” as a reason to acknowledge the other person without needing to take actual action, the word also becomes more poisonous to motivation when we use it in self-talk, or when discussing matters with someone we want to actually be honest with.

“Should” also tends to get used for guilt-tripping, so expressions like “I should have left that party sooner” might make us feel bad rather than focusing on our attention on the benefits of having left earlier. The next time we’re at a party, the former phrasing incentivizes us to come up with excuses for why it’s okay to stay this time around. The latter encourages us to actually consider the benefits and costs of the leaving earlier versus staying, and then choosing the option that’s the most appropriate.

2. Replace expressions like “I’m bad at X” with “I’m currently bad at X” or “I’m not yet good at X”.


  • “I can’t draw.” -> “I can’t draw yet.”
  • “I’m not a people person.” -> “I’m currently not a people person.”
  • “I’m afraid of doing anything like that.” -> “So far I’m afraid of doing anything like that.”

Motivation: the rephrased expression draws attention to the possibility that we could become better, and naturally leads us to think about ways in which we could improve ourselves. It again emphasizes our own agency and the fact that for a lot of things, being good or bad at them is just a question of practice.

Even better, if you can trace the reason of your bad-ness, is to

3. Eliminate vague labels entirely and instead talk about specific missing subskills, or weaknesses that you currently have.


  • “I can’t draw.” -> “Right now I don’t know how to move beyond stick figures.”
  • “I’m not a people person.” -> “I currently lock up if I try to have a conversation with someone.”

Motivation: figuring out the specific problem makes it easier to figure out what we would need to do if we wanted to address it, and might gives us a self-image that’s both kinder and both realistic, in making the lack of skill a specific fixable problem rather than a personal flaw.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Friday, October 9th, 2015
5:36 pm - Rational approaches to emotions

There are a number of schools of thought that teach what might be called a “rationalist” approach to emotions, i.e. seeing that your emotions are a map that’s good to distinguish from the territory, and giving you tools for both seeing the distinction and for evaluating the map-territory correspondence better.

1) In cognitive behavioral therapy, there is the “ABC model“: Activating Event, Belief, Consequence. Idea being that when you experience something happening, you will always interpret that experience through some (subconscious) belief, leading to an emotional consequence. E.g. if someone smiles at me, I might either believe that they like me, or that they are secretly mocking me; two interpretations that would lead to very different emotional responses. Once you know this, you can start asking yourself the question of “okay, what belief is causing me to have this emotional reaction in response to this observation, and does that belief seem accurate?”.

2) In addition to seeing your emotional reactions as something that tell you about your beliefs, you can also see them as something that tells you about your needs. This is the approach taken in Non-Violent Communication, which has the four-step process of Observation, Feeling, Need, Request. The four-step process is most typically discussed as something that’s a tool for dealing with interpersonal conflict, as in “when I see you eating the foods I put in the fridge, I feel anxious, because I need the safety of being able to know whether I have food in stock or not; could you please ask before eating my food in the future?”. However, it’s also useful for dealing with personal emotional turmoil and figuring out what exactly is upsetting you in general, or for dealing with internal conflict.

3) In both CBT and NVC, an important core idea is that they teach you to distinguish between an observation and interpretation, and that it’s the interpretations are what cause your emotional reactions. (For anyone curious, the more academic version of this is appraisal theory; the paper “When are emotions rational?” is relevant.) However, the NVC book, while being an excellent practical manual, does not do a very good job of explaining the theoretical reasons for why it works, which sometimes causes people to arrive at interpretations of NVC which cause them to behave in socially maladapted ways. For this reason, it might be a good idea to first read Crucial Conversations, which covers a lot of similar ground but goes into more theory about the “separating observations and interpretations” thing. Then you can read NVC after you’ve gotten the theory from CC. (CC doesn’t talk as much about needs, however, so I do still recommend reading both.)

4) It’s fine to say that “okay, if you’re having an emotional reaction you’re having difficulties dealing with, try to figure out the beliefs and needs behind it and see what they’re telling you and whether you’re having any incorrect beliefs”! But it’s a lot harder to actually be able to apply that if you’re in an emotionally charged situation. That’s where the various courses teaching mindfulness come in – mindfulness is basically the ability to step a little back from your emotions and thoughts, observe them as they are without getting swept up in them, and then being able to evaluate them critically if needed. You’ll probably need a lot of practice in various mindfulness exercises in order to get the techniques from CBT, NVC, and CC to live up to their full potential.

5-6) An important idea that’s been implied in the previous points, but not entirely spelled out, is that your emotions are your friends. They communicate to you information about your subconscious assessments of the world, as well as of your various needs. A lot of people tend to have somewhat of a hostile approach to their emotions, trying to at least control and get rid of their negative emotions. But this is bound to lead to internal conflict; and various studies indicate that a willingness to accept negative emotions and pain will actually make them much less serious.

In my personal experience, once you take to the habit of asking your emotions what they’re telling you and then processing that information in an even-handed way, then those negative emotions will often tend to go away after you’ve processed the thing they were trying to tell you. By “even-handed” I mean that if you’re feeling anxious because you’re worried of some unpleasant thing X being true, then you actually look at the information suggesting that X might be true and consider whether it’s the case, rather than trying to rationalize a conclusion for why X wouldn’t be true. Your subconscious will know, and keep pestering you.

Some of CFAR’s material, such as aversion factoring points this way; also Acceptance and Commitment Therapy as elaborated on in Get out of your mind and into your life seems to be largely about this, though I’ve only read about the first 30% so far.

Some of my earlier posts on these themes: suffering as attention-allocational conflict, avoid misinterpreting your emotions.

(I have been intending to write a much more in-depth post on this topic for a while, but it’s such a large post that I haven’t gotten around that; so I figured I’d just write something quickly in the hopes of it also being of value.)

Originally published at Kaj Sotala. You can comment here or there.

(2 echoes left behind | Leave an echo)

Friday, October 2nd, 2015
9:03 am - Two conversationalist tips for introverts

Two of the biggest mistakes that I used to make that made me a poor conversationalist:

1. Thinking too much about what I was going to say next. If another person is speaking, don’t think about anything else, where “anything else” includes your next words. Instead, just focus on what they’re saying, and the next thing to say will come to mind naturally. If it doesn’t, a brief silence before you say something is not the end of the world. Let your mind wander until it comes up with something.

2. Asking myself questions like “is X interesting / relevant / intelligent-sounding enough to say here”, and trying to figure out whether the thing on my mind was relevant to the purpose of the conversation. Some conversations have an explicit purpose, but most don’t. They’re just the participants saying whatever random thing comes to their mind as a result of what the other person last said. Obviously you’ll want to put a bit of effort to screening off any potentially offensive or inappropriate comments, but for the most part you’re better off just saying whatever random thing comes to your mind.

Relatedly, I suspect that these kinds of tendencies are what make introverts experience social fatigue. Social fatigue seems [in some people’s anecdotal experience; don’t have any studies to back me up here] to be associated with mental inhibition: the more you have to spend mental resources on holding yourself back, the more exhausted you will be afterwards. My experience suggests that if you can reduce the amount of filters on what you say, then this reduces mental inhibition, and correspondingly reduces the extent to which socializing causes you fatigue.

Peter McCluskey reports of a similar experience; other people mention varying degrees of agreement or disagreement.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Tuesday, August 18th, 2015
2:40 pm - Change blindness

Antidepressants are awesome. (At least they were for me.)

It’s now been about a year since I started on SSRIs. Since my prescription is about to run out, I scheduled a meeting with a psychiatrist to discuss whether to stay on them. Since my health care provider has changed, I went to my previous one and got a copy of my patient records to bring to the new one.

And wow. It’s kinda shocking to read them: my previous psychiatrist has written down things like: “Patient reports moments of despair and anguish of whether anything is going to lead to anything useful, and is worried for how long this will last. Recently there have been good days as well, but isn’t sure whether those will keep up.”

And the psychologist I spoke with has written down: “At times has very negative views of the future, afraid that will never reach his goals.”

And the thing is, reading that, I remember saying those things. I remember having those feelings of despair, of nothing ever working out. But I only remember them now, when I read through the records. I had mostly forgotten that I even did have those feelings.

When I dig my memory, I can find other such things. A friend commenting to me that, based on her observations, I seem to be roughly functional maybe about half the time. Me posting on social media that I have a constant anxiety, a need to escape, being unable to really even enjoy any free time I have. A feeling that taking even a major risk for the sake of feeling better would be okay, because I didn’t really have all that much to lose. Having regular Skype sessions with another friend, and feeling bad because he seemed to be getting a lot of things done, and my days just seemed to pass by without me managing to make much progress on anything.

All of that had developed so gradually and over the years that it had never really even occurred to me that it wasn’t normal. And then, after I got the antidepressants, those helped me get back on my feet, and then things gradually improved until I no longer even remembered the depths of what I had thought was normal, a year back.

Change blindness. It’s a thing.

For a less anecdotal summary on the effects of SSRIs, see Scott Alexander’s SSRIs: Much More Than You Wanted to Know for a comprehensive look at the current studies.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Tuesday, July 7th, 2015
4:26 pm - DeepDream: Today psychedelic images, tomorrow unemployed artists

One interesting thing that I noticed about Google’s DeepDream algorithm (which you might also know as “that thing making all pictures look like psychedelic trips“) is that it seems to increase the image quality. For instance, my current Facebook profile picture was ran through DD and looks sharper than the original, which was relatively fuzzy and grainy.

Me, before and after drugs.

Me, before and after drugs.

If you know how DD works, this is not too surprising in retrospect. The algorithm, similar to the human visual system, works by first learning to recognize simple geometric shapes, such as (possibly curvy) lines. Then it learns higher-level features combining those lower-level features, like learning that you can get an eyeball by combining lines in a certain way. The DD algorithm looks for either low- or high-level features and strengthens them.

Lines in a low-quality image are noisy versions of lines in a high-quality image. The DD algorithm has learned to “know” what lines “should” look like, so if you run it on the low-level setting, it takes anything possible that could be interpreted as a high-quality (possibly curvy) line and makes it one. Of course, what makes this fun is that it’s overly aggressive and also adds curvy lines that shouldn’t actually be there, but it wouldn’t necessarily need to do that. Probably with the right tweaking, you could make it into a general purpose image quality enhancer.

A very good one, since it wouldn’t be limited to just using the information that was actually in the image. Suppose you gave an artist a grainy image of a church, and asked them to draw something using that grainy picture as a reference. They could use that to draw a very detailed and high-quality picture of a church, because they would have seen enough churches to imagine what the building in the grainy image should look like in real life. A neural net trained on a sufficiently large dataset of images would effectively be doing the same.

Suddenly, even if you were using a cheap and low-quality camera to take your photos, you could make them all look like high-quality ones. Of course, the neural net might be forced to invent some details, so your processed photos might differ somewhat from actual high-quality photos, but it would often be good enough.

But why stop there? We’ve already established that the net could use its prior knowledge of the world to fill in details that aren’t necessarily in the original picture. After all, it’s doing that with all the psychedelic pictures. The next version would be a network that could turn sketches into full-blown artwork.

Just imagine it. Maybe you’re making a game, and need lots of art for it, but can’t afford to actually pay an artist. So you take a neural net, feed to it a large dataset of the kind of art you want. Then you start making sketches that aren’t very good, but are at least recognizable as elven rangers or something. You give that to the neural net and have it fill in the details and correct your mistakes, and there you go!

If NN-generated art would always have distinctive recognizable style, it’d probably quickly become seen as cheap and low status, especially if it wasn’t good at filling in the details. But it might not acquire that signature style, depending on how large of a dataset was actually needed for training it. Currently deep learning approaches tend to require very large datasets, but as time goes on, possibly you could do with less. And then you could get an infinite amount of different art styles, simply by combining any number of artists or art styles to get a new training set, feeding that to a network, and getting a blend of their styles to use. Possibly people might get paid doing nothing but just looking for good combinations of styles, and then selling the trained networks.

Using neural nets to generate art would be limited to simple 2D images at first, but you could imagine it getting to the point of full-blown 3D models and CGI eventually.

And yes, this is obviously going to be used for porn as well. Here’s a bit of a creepy thing: nobody will need to hack the iCloud accounts of celebrities in order to get naked pictures of them anymore. Just take the picture of any clothed person, and feed it to the right network, and it’ll probably be capable of showing you what that picture would look like if the person was naked. Or associated with one of any number of kinks and fetishes.

It’s interesting that for all the talk about robots stealing our jobs, we were always assuming that the creative class would basically be safe. Not necessarily so.

How far are we from that? Hard to tell, but I would expect at least the image quality enhancement versions to pop up very soon. Neural nets can already be trained on text corpuses and generate lots of novel text that almost kind of makes sense. Magic cards, too. I would naively guess image enhancement to be an easier problem than actually generating sensible text (which is something that seems AI-complete). And we just got an algorithm that can take two images of a scene and synthesize a third image from a different point of view, to name just the latest fun image-related result from my news feed. But then I’m not an expert on predicting AI progress (few if any people are), so we’ll see.

EDITED TO ADD: On August 28th, less than two months after the publication of this article, the news broke of an algorithm that could learn to copy the style of an artist.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Saturday, June 6th, 2015
2:06 pm - Learning to recognize judgmental labels

In the spirit of Non-Violent Communication, I’ve today tried to pay more attention to my thoughts and notice any judgments or labels that I apply to other people that are actually disguised indications of my own needs.

The first one that I noticed was this: within a few weeks I’ll be a visiting instructor at a science camp, teaching things to a bunch of teens and preteens. I was thinking of how I’d start my lessons, pondered how to grab their attention, and then noticed myself having the thought, “these are smart kids, I’m sure they’ll give me a chance rather than be totally unruly from the start”.

Two judgements right there: “smart” and “unruly”. Stopped for a moment’s reflection. I’m going to the camp because I want the kids to learn things that I feel will be useful for them, yes, but at the same time I also have a need to feel respected and appreciated. And I feel uncertain of my ability to get that respect from someone who isn’t already inclined to view me in a favorable light. So in order to protect myself, I’m labelling kids as “smart” if they’re willing to give me a chance, implying that if I can’t get through to some particular one, then it was really their fault rather than mine. Even though they might be uninterested in what I have to say for reasons that have nothing to do with smarts, like me just making a boring presentation.

Ouch. Okay, let me reword that original thought in non-judgemental terms: “these are kids who are voluntarily coming to a science camp and who I’ve been told are interested in learning, I’m sure they’ll be willing to listen at least to a bit of what I have to say”.

There. Better.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Friday, May 29th, 2015
8:27 am - Adult children make mistakes, too

There’s a lot of blame and guilt in many people’s lives. We often think of people in terms of good or bad, and feel unworthy or miserable if we fail at things we think we should be able to do. When we don’t do quite as well as we could, because we’re tired or unwell or distracted, we blame and belittle ourselves.

Let’s take a different approach.

Think of a young child, maybe three years old. He has come a long way from a newborn, but he’s still not that far along. If he tries his hand at making a drawing, and it’s not quite up to adult standards, we don’t think of him as being any worse for that. Or if he doesn’t quite want to share his toys or gets frustrated with his sibling, we understand that it’s because he’s still young, and hasn’t yet learned all the people skills. We don’t judge him for that, but just gently teach him what we’d like him to do instead.

It’s not that he’s good or bad, it’s just that he lacks the skills and practice. At the same time, we see the vast potential in him, all the way that he has already come and the way he’s learning new things every day.

Now, look at yourself from the perspective of some immensely wise, benevolent being. If you’re religious, that being could be God. If you have a transhumanist bent, maybe a superintelligent AI with understanding beyond human comprehension. Or you could imagine a vastly older version of you, one that had lived for thousands of years and seen and done things you couldn’t even imagine.

From the perspective of such a being, aren’t you – and all those around you – the equivalent of that three-year-old? Someone who’s inevitably going to make mistakes and be imperfect, because the world is such a complicated place and nobody could have mastered it all? But who’s nevertheless come a long way from what they once were, and are only going to continue growing?

Nate Soares has said that he feels more empathy towards people when he thinks of them as “monkeys who struggle to convince themselves that they’re comfortable in a strange civilization, so different from the ancestral savanna where their minds were forged”. Similarly, we could think of ourselves as young children outside their homes, in a world that’s much too complicated and vast for us to ever understand more than a small fraction of it, still making a valiant effort to do our best despite often being tired or afraid.

Let’s take this attitude, not just towards others, but ourselves as well. We’re doing our best to learn to do the right things in a big, difficult world. If we don’t always succeed, there’s no blame: just a knowledge that we can learn to do better, if we make the effort.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Friday, May 8th, 2015
1:35 pm - Harry Potter and the Methods of Latent Dirichlet Allocation

My summer job involves topic modelling, using machine learning tools to automatically learn different topics that some set of documents covers, so that the documents could then be classified by topic. I haven’t done this before, so I don’t yet have a good intuition of how currently available tools work. To develop that intuition, I’m playing around with different tools and datasets, to see what kinds of results different methods give.

One interesting case would be to run a topic modeler on an extended work of fiction with various story arcs and see if it could, for instance, identify specific story arcs. With 122 chapters and several distinct story arcs and cliques of characters, Harry Potter and the Methods of Rationality seemed like a good dataset to try this on. (The following might contain unmarked minor spoilers to the story; you’ve been warned.)

I went to hpmor.com and copy-pasted all the chapters into separate text files. I removed the author’s notes and the opening quotes and various dedications to Rowling in the early chapters, as well as the “the next chapter will be out on day X” mentions. I also omitted the Omake chapters.

I then used the free analysis tool Mallet to apply LDA to the dataset. LDA (Latent Dirichlet Allocation) is a topic modeling method in which a topic is formally defined to be a distribution over a vocabulary. For example, we might have a topic corresponding to the HPMOR’s Azkaban arc, which would include words such as quirrel, dementor, azkaban, and bellatrix with a high probability.

LDA assumes that documents are written according to the following process:

1. Randomly choose a distribution over topics.
2. For each word in the document:
a. Randomly choose a topic from the distribution of topics in step #1
b. Randomly choose a word from the corresponding distribution over the vocabulary

(David M. Blei 2012: Probabilistic Topic Models. Communications of the ACM. DOI:10.1145/2133806.2133826)

Of course, this isn’t the actual way that real-world documents are written, but we could kind of imagine that they were. For example, let’s imagine Eliezer Yudkowsky sitting down to write a chapter of HPMOR which he decides will mostly be the aftermath of the Azkaban arc, and will also tie those events together with Harry’s friendship with Draco. This would correspond to step 1 in the above process: let’s say that he decides that the chapter will be 70% about the SPE arc and 30% about the Harry-Draco relationship.

Now he starts writing. Each word (maybe more realistically, each sentence) can be related to either the SPE arc or the Harry-Draco relationship, so he will alternate between those two topics as he ties them together, choosing between them with a 70-30 probability. For either topic, there are several different sub-topics within that topic that he can cover, so we can think of there being a random chance for any word associated with that topic being selected. Of course, some words, like “Harry”, are likely to be associated with both topics.

When LDA is given an existing collection of documents, it then tries to reconstruct these original probabilities and distributions. In other words, it asks the question of “given this text, and given what I assume to have been the original process which generated it, which values would have been the most likely to produce this text?”. Mallet does this using Gibbs sampling: if you want to read more about that, see Wikipedia for Gibbs sampling in general or Steyvers & Griffiths (2006) for a discussion of it in the context of LDA.

But enough theory, let’s start experimenting! I start off by having Mallet extract the raw data from the documents into a form it can use, and ask it to consider 1- and 2-grams: that is, it will base its analysis both on individual words and pairs of words. Then I ask it to generate 20 topics for us, and to list the 20 most probable words in each topic.

(for all trials, I’m running LDA for 1000 iterations, re-optimizing the hyperparameters every 20 iterations, with a burn-in of 200 iterations)

Here are the initial results:

0 0,02579 phoenix wizard fawkes war blaise millicent zabini black_mist mist wizard_voice envelope million save haukelid back_sleep phoenixes violence tower bulstrode
1 0,02547 dad verres petunia mum eraser felthorne books michael evans parents atoms verres_evans rianne mother transfigure miss_felthorne father experiment michael_verres
2 0,03884 snake iss hissed defense_professor hagrid sstone mr_hagrid unicorn thiss musst ssay monster defense chamber sspeak chamber_secrets secrets bed slytherin_monster
3 0,031 severus minerva potions_master albus potions lesath severus_snape master lestrange professor_snape snape neville fred lesath_lestrange time_turner george azkaban gryffindors discipline
4 0,04833 quirrell professor_quirrell professor mr_potter mr quirrell_voice quirrell_harry goyle potter_professor mr_goyle classroom lesson quirrell_face slytherins quirrell_points battle_magic derrick lose skeeter
5 0,04691 draco father draco_harry draco_didn draco_voice harry_draco ron science conspiracy draco_couldn draco_draco platform draco_nodded draco_looked mother station rival draco_turned muggleborns
6 0,03858 professor_mcgonagall mcgonagall galleons mr_potter gold transfiguration bag shop coins wizarding witch malkin alley money wizarding_world madam_malkin street kit gringotts
7 0,02716 bellatrix dementors azkaban amelia snake patronus metal bahry broomstick auror charm aurors corridor quirrell bellatrix_black hissed hole iss cell
8 0,03185 troll hagrid weasley forest centaur yeh tracey tick unicorn broomstick filch mr_hagrid weasley_twins twins forbidden_forest rubeus argus george fred
9 0,03086 voldemort lord mirror lord_voldemort stone dark_lord altar tom perenelle tom_riddle dark parseltongue gun horcrux riddle child iss sshall hissed
10 0,03643 malfoy lucius lucius_malfoy wizengamot lord_malfoy debt house_malfoy draco_malfoy son house_potter thousand_galleons veritaserum false_memory lies murder hall galleons podium troll
11 0,04617 dementor headmaster patronus fear patronus_charm cast_patronus chocolate cage corporeal headmaster_harry patronuses harry_headmaster seamus anthony happy corporeal_patronus happy_thought harry_wizard warm
12 0,02038 draco magic fred paper dr george powerful test harry_potter fading fred_george magic_fading skeeter rita dr_potter blood scientist shadowy spells
13 0,02188 draco general soldiers neville sunshine chaos army dragon zabini battle granger armies doom_doom doom malfoy dragon_army longbottom forest dragons
14 0,02725 moody elder_wand dawn elder experiment lesath aftermath ravenclaw_common horizon peverell graveyard vow milgram philosopher_stone bellatrix_black narrow labeled unicorn hermione_nodded
15 0,02818 moody lupin mad_eye eye prophecy mad amelia remus mr_lupin monroe voldemort bones albus minerva amelia_bones alastor line eye_moody lily
16 0,03993 daphne susan tracey hannah lavender bully bullies draco_malfoy greengrass year millicent corridor girl parvati bones sprout professor_snape davis susan_bones
17 0,04419 granger miss_granger hermione miss padma hero patil heroes padma_patil professor_sinistra hermione_voice girls sinistra humming witches hermione_didn cell girl hero_hermione
18 0,02098 hat sorting game points goyle neville sorting_hat note comed_tea comed paper slytherins ha_ha mr_goyle remembrall ha tea ernie madam_hooch
19 1,37501 harry professor potter hermione voice time didn back quirrell dumbledore professor_quirrell mr don thought boy dark wasn hogwarts eyes

Not bad. The initial topics are a bid mixed bags, but they get better later on. The 0th topic seems to roughly be about the war. The 1st is mostly about Harry’s parents, but somewhat oddly, Rianne Felthorne gets included in the same topic.

Number 2 is interesting: it’s picking up Parseltongue words as being associated with the Defense Professor. This makes sense, because he occasionally speaks in Parseltongue, so if he’s present in a chapter, it’s also more likely that Parseltongue words will be present. Apparently Parseltongue words are also associated with unicorns and Hagrid, because both show up in this topic.

Number 3 seems to start out as a “senior staff of Hogwarts” topic, with Snape, McGonagall, and Dumbledore being included (but not Quirrel, interestingly enough), but then also has mentions of George, Azkaban, and Gryffindors in the end. Number 4 is clearly about Quirrel, and to a lesser extent Slytherins.

Number 5 seems to be the Draco-Harry chapters, and among the more informative words includes 2-grams such as “draco_nodded, draco_looked, draco_turned”. As an interesting observation, besides one hermione_nodded in topic number 14, Draco seems to be only character whose nods, lookings, or turnings were picked up by the modeler: I wonder what’s up with that. Number 6 involves McGonagall, Harry, and Harry’s money; number 7 looks to be the Azkaban arc. Number 8 is a topic combining Hagrid, the Forbidden Forest, and apparently also the twins. And so on.

This looks pretty good, but we could try varying the number of topics. Also, Mallet allows me to add a list of words to ignore in the analysis. By default, it already ignores words like the, is, at, and so on. Let’s add a few: “didn didn’t couldn couldn’t nodded looked turned said wasn wasn’t ‘t t”

New results:

0 0,03768 hagrid troll weasley forest mr_hagrid centaur yeh unicorn tracey tick weasley_twins filch twins broomstick forbidden_forest rubeus fred forbidden argus
1 0,04613 snape potions_master professor_snape quidditch sprout potions professor_sprout felthorne severus_snape master mirror severus rianne game susan plant susan_bones miss_felthorne exam
2 0,04554 dementor patronus headmaster phoenix fear patronus_charm fawkes chocolate patronuses cage wise corporeal seamus harry_headmaster star anthony wizard_voice souls corporeal_patronus
3 0,02705 fawkes moody envelope comed_tea comed tea experiment bellatrix_black hat lesath pillow train milgram prefect compartment drink frodo cards experimental
4 0,03366 voldemort lord dark_lord lord_voldemort iss tom stone altar hissed dark horcrux wand perenelle riddle tom_riddle thiss parseltongue vow gun
5 0,0315 bellatrix azkaban dementors snake amelia patronus metal broomstick bahry auror charm professor_quirrell quirrell corridor woman aurors hissed iss hole
6 0,0284 severus minerva neville hat lesath sorting lestrange sorting_hat fred george lesath_lestrange severus_snape legilimens fred_george discipline severus_voice handsome professor_snape points_ravenclaw
7 0,04123 daphne susan tracey hannah lavender girl bullies bully hermione greengrass girls millicent parvati draco_malfoy padma slytherin jugson davis corridor
8 0,02515 draco soldiers neville sunshine general chaos army dragon granger zabini battle malfoy armies doom doom_doom dragon_army forest shield dragons
9 0,01942 draco magic fred harry_potter dr paper george fading fred_george skeeter test magic_fading powerful rita blood dr_potter scientist wizards shadowy
10 0,05943 quirrell professor_quirrell professor mr_potter quirrell_harry mr quirrell_voice chamber potter_professor lose quirrell_face battle_magic lesson secrets snake derrick salazar monster quirrell_smiling
11 0,02807 mirror transfiguration transfigure eraser flamel atoms ball harry_hermione page hermione_voice separate sentient frame plants subject solid pig free_transfiguration objects
12 0,03587 hermione granger miss_granger miss padma hero heroes patil padma_patil elder_wand elder hermione_voice humming professor_flitwick protest cell mysterious_wizard professor_sinistra sinistra
13 0,03058 albus moody minerva severus voldemort prophecy eye amelia mad mad_eye potions_master bones monroe alastor headmistress amelia_bones eye_moody potions mark
14 0,03796 mcgonagall professor_mcgonagall parents dad mum verres gold evans galleons petunia bag christmas michael father trunk shop wizarding verres_evans coins
15 0,03985 goyle mr_goyle points slytherins defence paper ha game ha_ha note classroom remembrall pie hooch neville ernie bars boys madam_hooch
16 0,04854 draco father draco_harry ron science draco_voice platform conspiracy harry_draco mother pettigrew slytherin_house patronus_charm train draco_eyes patronus draco_don narcissa station
17 0,03689 malfoy lucius lucius_malfoy wizengamot lupin lord_malfoy remus debt mr_lupin son house_malfoy house_potter james galleons veritaserum mad false_memory longbottom vote
18 1,40563 harry professor potter voice hermione time back dumbledore quirrell mr professor_quirrell don thought boy dark hogwarts eyes face lord
19 0,03283 blaise millicent country zabini black_mist mist traitors hospital violence professor_voice harry_wizard pedestals jugson leader lord_jugson wishes shrug blue_light lucius_malfoy

The order of topics is now somewhat different. The Draco/Harry science chapters, which were previously topic number 5, now look to be topic 16: they seem a little less distinct now that we told the program to remove words like “nodded”, “looked”, and “turned”, which had been things that were previously associated with Draco, and probably with Draco talking to Harry in particular. Having fewer words that co-occur when Harry and Draco specifically are talking makes “Harry and Draco talking” a less distinct cluster. Maybe we shouldn’t have asked the program to ignore those words. I’ll take them off the ignore list.

What happens if we try 10 or 30 topics?

Here are the results with 10:

0 0,0553 hat transfiguration sorting goyle mr_goyle sorting_hat points transfigure class defence eraser note game paper professor_mcgonagall ha classroom ha_ha shadowy
1 0,05378 severus minerva azkaban albus fawkes phoenix lesath lestrange bellatrix severus_snape moody potions_master lesath_lestrange bellatrix_black neville envelope alarm hours severus_voice
2 0,06163 professor_quirrell quirrell professor mirror lord_voldemort voldemort stone defense_professor defense perenelle quirrell_harry parseltongue chamber flamel quirrell_voice tom sprout horcrux quirrell_face
3 0,04948 bellatrix snake voldemort azkaban dementors iss hissed amelia wand patronus lord bahry dark_lord broomstick metal charm altar auror dark
4 0,0406 malfoy moody lucius wizengamot lucius_malfoy albus eye mad lord_malfoy amelia mad_eye azkaban minerva amelia_bones eye_moody debt monroe alastor line
5 0,06065 hagrid dementor troll lupin forest remus mr_lupin tracey mr_hagrid centaur yeh unicorn tick filch huge james weasley elder_wand rubeus
6 0,04647 draco soldiers general sunshine army chaos dragon zabini neville battle granger malfoy armies blaise doom_doom dragon_army dragons dr father
7 0,05774 father professor_mcgonagall parents mum dad money fred galleons george ron verres gold rita science skeeter books trunk evans bag
8 1,55479 harry professor potter voice hermione time back quirrell dumbledore mr professor_quirrell don thought boy dark draco hogwarts eyes harry_potter
9 0,05979 daphne susan hermione tracey hannah padma girl bullies lavender girls millicent bully greengrass miss davis parvati susan_bones hero jugson

0 jumps out at once: it looks like the sorting hat is now a major topic! But upon a closer inspection, it looks like this might be an artifact of the 1-gram and 2-gram versions of it being double-counted: “hat”, “sorting”, and “sorting_hat” are all included the same topic. If we were to remove “hat” and “sorting”, the topic would become “transfiguration goyle mr_goyle sorting_hat points transfigure class defence eraser note game paper professor_mcgonagall ha classroom ha_ha shadowy”, which makes the topic look a lot less coherent. Notice that “goyle” also gets double-counted, with “goyle” and “mr_goyle”.

In general, most of these topics don’t look like they would correspond with any clear “real” topic, though there are a few exceptions like number 6 being related to the Quirrel Armies. Notice that the double-counting is also pretty prominent in general.

It seems useful to stop and reflect on why these results are now so bad. Here’s what I think: there are a lot of different events and storylines in HPMOR, each associated with their specific vocabulary. For instance, Rianne Felthorne, who was picked up in the 20-topic version, only appears in chapters 71, 76, and 79. If you tell the model to assume that there are a lot of topics, then it might actually come up with the hypothesis that there’s a topic which covers those three chapters and which has a very high probability of talking about Rianne. But with a low number of topics assumed, it can’t “waste” any topics by dedicating them to such rare words. Instead, in order to cover most of the documents, it has to assume that Rianne is part of some much bigger topic which spans a lot of chapters. Since Rianne only appears in three chapters, such a wide-spanning topic would have to have a very low probability of generating Rianne’s name. This means that topics will become dominated by words which appear pretty often in the text, and in a lot of different contexts – but of course that makes the topics less distinctive and meaningful. The only distinctive topics will be those that are major enough to span several chapters, which is the case for the Quirrel armies.

So how about the opposite direction, with 30 topics?

0 0,02023 hagrid troll forest tracey centaur yeh broomstick tick unicorn filch weasley forbidden_forest mr_hagrid huge forbidden unicorns argus rubeus half_giant
1 0,01219 draco harry_potter magic dr wizards powerful paper blood test father fading figure magic_fading spells dr_potter scientist muggles ll scientists
2 0,02164 elder pettigrew elder_wand hero vow rat dawn sirius_black prophecies rival sirius unicorn revived fingernails horizon hermione_harry girl_revived back_dead rooftop
3 0,02732 severus mum dad lesath verres evans parents father petunia lestrange neville michael verres_evans letter michael_verres lesath_lestrange books window roberta
4 0,02912 quirrell professor_quirrell professor mr_potter mr goyle classroom mr_goyle quirrell_harry quirrell_voice lose potter_professor skeeter quirrell_face quirrell_points derrick slytherins rita_skeeter rita
5 0,02385 mcgonagall professor_mcgonagall gold galleons mr_potter shop bag coins parents alley diagon malkin sighed wizarding_world street wizarding madam_malkin trunk pouch
6 0,02138 draco general soldiers neville sunshine chaos army dragon zabini battle granger malfoy armies doom doom_doom dragon_army dragons longbottom shield
7 0,01796 azkaban phoenix moody bellatrix fawkes envelope bellatrix_black aftermath lesath experiment harry_stared amelia milgram black_azkaban frodo bird clock pillow mask
8 0,02302 auror amelia defense_professor amelia_bones duel department mr_malfoy exam false_memory grade charmed law_enforcement beauxbatons trophy_room trophy enforcement magical_law department_magical memory_charm
9 0,02788 miss miss_granger granger padma hero heroes patil padma_patil professor_flitwick humming witches hermione girl hermione_voice cell professor_sinistra sinistra mysterious_wizard hero_hermione
10 0,0357 draco father ron draco_harry conspiracy platform draco_voice sad harry_draco station draco_nodded draco_turned mother narcissa lucius haired draco_eyes revenge slytherin_house
11 0,02313 responsible wards troll gryffindor head_table twins weasley hall weasley_twins minerva mr_hagrid great_hall cracked blame storeroom sinistra hagrid jugson year_witch
12 0,02151 malfoy lucius lucius_malfoy lord_malfoy wizengamot house_malfoy debt son house_potter house thousand_galleons ancient galleons plum_colored plum colored goblin colored_robes troll
13 0,00797 moody eye prophecy dark mad mad_eye dark_lord monroe albus mark mcgonagall scarred severus evidence lord david dark_mark scarred_man eye_moody
14 0,02847 voldemort dark_lord lord dark wand altar child gun iss hissed stone body vow master lord_voldemort girl_child apokatastethi graveyard sshall
15 0,02275 bellatrix dementors azkaban amelia broomstick snake metal bahry auror corridor professor_quirrell quirrell charm patronus bellatrix_black woman hole cell iss
16 0,02847 quidditch snape sprout professor_snape professor_sprout potions_master game bones susan plant susan_bones philosopher_stone mirror potions cedric snitch chamber broomstick tendrils
17 0,02856 lupin remus mr_lupin james lily remus_lupin peter nuclear stars star children_children haukelid tower edge million script ravenclaw_tower soft_voice godric_hollow
18 0,03067 dementor patronus headmaster patronus_charm fear patronuses chocolate cage corporeal happy cast_patronus presence anthony dementors expecto_patronum corporeal_patronus seamus happy_thought harry_headmaster
19 0,02245 snake iss hissed defense_professor hagrid mr_hagrid chamber infirmary unicorn monster chamber_secrets secrets slytherin_monster sstone sspeak yess ssay hissed_harry parseltongue
20 0,04184 daphne susan tracey hannah hermione girl lavender bully bullies greengrass girls parvati millicent davis slytherin corridor bones padma susan_bones
21 0,01251 wizard blaise millicent zabini war black_mist mist harry_wizard gregory violence jugson oaken_door pedestals bulstrode lord_jugson wizard_voice black_cloak half_moon black_hat
22 0,0332 hermione boy library book pages sentient page plate chocolate year_girl train talk flamel plants experiment compartment century research snakes
23 0,02155 hat sorting tea game sorting_hat comed note points ha ravenclaw comed_tea ha_ha pie neville bars paper largest hufflepuffs slytherins
24 0,01522 professor_quirrell quirrell mirror lord_voldemort voldemort dumbledore stone perenelle tom cauldron potion horcrux albus_dumbledore parseltongue tom_riddle riddle flamel david_monroe monroe
25 0,02873 severus minerva albus snape amelia voldemort potions_master bones potions master amelia_bones headmistress felthorne merlin moody severus_snape rianne professor_snape madam_bones
26 1,34252 harry professor potter hermione voice time quirrell back professor_quirrell dumbledore mr don thought boy dark hogwarts eyes face lord
27 0,01638 dumbledore goyle mr_goyle remembrall turner paper ah ernie discipline gargoyle madam_hooch hooch rock neville_remembrall points_ravenclaw thursday swamp gregory_goyle chicken
28 0,02038 transfiguration fred george transfigure fred_george eraser atoms skeeter rita twins ball minerva rita_skeeter flume impossible collection separate weasley_twins subject
29 0,02581 pansy traitors generals chant prismatic_wall wishes country samuel male_voice male audience crush vow pretty luminos_shouted parkinson luminos gate halls

Hmm. Not sure if this is so great, either: now we might have the opposite problem, that 30 topics is too much freedom for the model, and it can hypothesize all kinds of minitopics that aren’t actually there. Now I’m pretty sure that one *could* come up with 30 coherent topics if one did it manually, but that would require using more structure than a basic form of LDA is capable of using.

So 20 topics was probably best. Out of curiosity, how would it look like if we only considered 1-grams? That would eliminate some double-counting, but would it actually improve the results?

0 0,24225 albus severus voldemort moody mr minerva dark master prophecy lord eye mcgonagall potter potions bones azkaban mad snape monroe
1 0,20082 harry bellatrix azkaban professor quirrell snake dementors amelia metal charm auror bahry aurors lord woman wizard defense broomstick corridor
2 2,48546 voice boy time back looked eyes turned hand head door hogwarts place face heard words moment black robes stood
3 0,16911 draco granger neville general soldiers sunshine chaos army malfoy battle dragon zabini hermione armies shield blaise longbottom doom fight
4 0,44362 harry patronus dementor death charm light stars wand voice cast fear dementors die silver wouldn happy died bright aurors
5 0,21435 professor harry points mcgonagall mr time game slytherin ravenclaw goyle desk neville students year slytherins sprout classroom note quidditch
6 2,01788 thought dark mind life time lord dumbledore part man thing long power stop knew great world understand side true
7 0,64986 professor quirrell mr defense potter dark students lord miss spell true obvious room headmaster slytherin snape lose today slytherins
8 0,08225 voldemort harry lord stone dark mirror iss wand hissed altar riddle tom child horcrux parseltongue death dumbledore perenelle white
9 0,98236 harry wand hand air sense spell broomstick left ground fire body hit cloak mind red pouch moving pointed back
10 0,1403 hat sort ron sorting secrets tea book slytherin comed table talk neville train snake drink rule secret carriage pages
11 0,12296 hermione transfiguration lupin transfigure remus mr wand minerva mcgonagall eraser form tiny peter pettigrew brain atoms separate wood steel
12 0,38543 dumbledore headmaster wizard phoenix albus fawkes eyes fire flitwick war stone cloak mcgonagall office wizards understand shoulder back desk
13 0,33105 hermione granger miss professor mcgonagall defense hogwarts ve hero hagrid mr head ll year tracey forest heroes girl centaur
14 0,15878 hermione daphne susan tracey slytherin snape girl padma hannah year malfoy potions lavender bullies table miss house greengrass millicent
15 0,15193 severus weasley neville george fred minerva students twins lesath table snape mr skeeter tick rita gryffindor lestrange potions man
16 0,26817 draco father magic slytherin blood malfoy powerful ll wizards test figure paper potter spells lost fading muggles dr mother
17 2,62309 harry potter don people ve things make face good ll wouldn hogwarts made sort wanted thought thing put point
18 0,31149 malfoy lucius house granger son wizengamot hogwarts potter dumbledore lord chair lived ancient debt murder aurors magical britain room
19 0,25969 mcgonagall professor parents mr father evans verres mum dad witch galleons money gold books magic world mother family wizarding

I’d say that’s definitely worse: I have difficulties picking up anything sensible, though it’s interesting to look at what *does* remain identifiable. Quirrel Armies show up once again, in topic number 3. They’re definitely the most resilient topic in the whole story. There are also a few others, like number 8 is strongly related to Vold… He-Who-Shall-Not-Be-Named.

(I also tried if 30 topics would work better for 1-grams; I won’t show you the results, because the answer was “not really”.)

What if only considered 2-grams? That’s going to produce a mess, but I’m still curious to see what it looks like. Also, I want to see whether our hero the Quirrel Armies manages to survive that challenge as well!

0 0,01128 sorting_hat comed_tea points_ravenclaw severus_voice lesath_lestrange potter_severus gryffindor_table harry_sat older_student trimmed_robes whisper_whisper school_discipline severus_smiling severus_face potions_professor students_looked red_trimmed black_robed perfect_occlumens
1 0,01493 potions_master professor_snape miss_felthorne false_memory severus_snape professor_sprout memory_charm rianne_felthorne empty_air sorting_hat theodore_nott attempted_murder trophy_room susan_bones snape_voice cedric_diggory wards_hogwarts felthorne_snape albus_quietly
2 0,01344 fred_george rita_skeeter mr_hagrid chamber_secrets slytherin_monster hissed_snake hissed_harry pale_blue miss_skeeter heir_slytherin source_magic mary_place green_snake rich_people imperius_curse people_sort solving_groups problem_solving order_chaos
3 0,0083 doom_doom dragon_army general_potter chaos_legion general_granger mr_goyle sunshine_regiment sunshine_soldiers draco_malfoy general_malfoy blaise_zabini sleep_hex sunshine_general neville_longbottom prisoner_dilemma mrs_davis dragon_general mr_thomas mr_mrs
4 0,01159 dark_lord mad_eye eye_moody mr_grim girl_child lord_voldemort mr_white death_eater dark_mark apokatastethi_apokatastethi scarred_man mr_moody voldemort_voice harry_scar high_voice april_pm voldemort_hissed apokatastethi_soma lord_spoke
5 0,01227 mr_goyle ha_ha older_slytherins cereal_bars largest_slytherin student_classroom mr_crabbe quirrell_points current_points dangerous_student martial_arts game_controller snapped_fingers green_study wearing_pyjamas box_cereal hint_hint hermione_mind ha_su
6 0,01779 professor_quirrell bellatrix_black defense_professor harry_thought metal_door guardian_charm thought_harry bellatrix_professor dark_lord muggle_device patronus_charm hole_wall harry_brain partial_transfiguration shadows_death harry_knew harry_turned life_eaterss green_spark
7 0,00985 amelia_bones bellatrix_black madam_bones mad_eye minerva_mcgonagall eye_moody chief_warlock line_merlin alastor_moody headmistress_mcgonagall merlin_unbroken black_azkaban harry_james harry_stared peter_pettigrew potter_evans order_phoenix muggle_weapons lesath_lestrange
8 0,00967 lord_voldemort tom_riddle baba_yaga david_monroe answer_parseltongue wizarding_war great_creation blackened_fire az_reth nicholas_flamel quirrell_dropped back_professor quirrell_looked professor_quirrell quidditch_game obtain_sstone lay_bed horcrux_spell harry_aloud
9 0,01368 seventh_year salazar_slytherin general_granger susan_bones draco_malfoy slytherin_ghost sunshine_general year_girl year_boy miss_davis fourth_year sixth_year ancient_house hufflepuff_girl hermione_harry doom_doom slytherin_girl daphne_greengrass ravenclaw_girl
10 0,01569 professor_mcgonagall mr_goyle madam_malkin mokeskin_pouch madam_hooch neville_remembrall diagon_alley gold_coins older_witch bag_gold healer_kit shake_hand mcgonagall_face gregory_goyle mcgonagall_sighed gold_harry cavern_level genetic_parents gold_silver
11 0,32335 professor_quirrell harry_potter mr_potter defense_professor professor_mcgonagall dark_lord hermione_granger harry_voice miss_granger draco_malfoy boy_lived albus_dumbledore patronus_charm professor_flitwick harry_looked shook_head harry_thought mr_malfoy harry_harry
12 0,01027 harry_wizard black_mist wizard_voice resurrection_stone harry_headmaster moon_glasses black_cloak lord_jugson oaken_door albus_dumbledore wizard_face black_hat headmaster_harry wizard_quietly death_eater save_lives dumbledore_voice blue_eyes pretending_wise
13 0,00772 hermione_voice elder_wand harry_hermione hermione_harry free_transfiguration unbreakable_vow liquid_gas transfigure_liquid narrow_keyhole start_year metal_ball hermione_nodded collection_atoms muggle_science ve_thinking unicorn_princess time_narrow girl_revived living_subject
14 0,02155 mr_lupin verres_evans michael_verres remus_lupin professor_verres professor_michael comed_tea harry_father evans_verres living_room cross_station letter_hogwarts godric_hollow christmas_eve parents_harry son_harry mr_bronze leo_granger dad_mum
15 0,01017 warm_happy back_sleep lord_voldemort albus_dumbledore state_mind ravenclaw_tower expecto_patronum long_ago golden_frame tattered_cloak corporeal_patronus red_gold light_years golden_back lay_beneath auror_goryanof master_flamel quirrell_pointed true_love
16 0,01284 mr_hagrid weasley_twins forbidden_forest half_giant great_hall tick_harry weasley_twin huge_man argus_filch rubeus_hagrid part_mind head_table unicorn_blood gryffindor_table fred_george magical_creatures false_memory ron_weasley fred_weasley
17 0,01825 harry_potter draco_voice magic_fading dr_potter draco_harry harry_draco shadowy_figure dr_malfoy death_eater draco_don draco_draco powerful_wizards green_light blood_purism draco_realized potter_draco don_draco fading_world paper_magic
18 0,01634 lucius_malfoy lord_malfoy house_malfoy house_potter plum_colored draco_malfoy thousand_galleons colored_robes dark_stone madam_longbottom ancient_hall blood_debt chief_warlock hundred_thousand noble_ancient malfoy_stood debt_owed lords_ladies hall_wizengamot
19 0,01762 miss_granger padma_patil hermione_voice professor_sinistra hero_hermione year_witch penelope_clearwater mysterious_wizard chaos_legion professor_vector hermione_turned amelia_bones endless_stair people_ve harry_friend beneath_half ravenclaw_girl common_sense leather_folder

The armies show up *very* distinctively as topic number 3. An interesting topic is number 12, which looks like it might involve Harry’s and Dumbledore’s debates about death and mortality, given the presence of 2-grams like “resurrection_stone, harry_headmaster, albus_dumbledore, wizard_face, wizard_quietly, death_eater, save_lives, dumbledore_voice, pretending_wise” (if some of these seem confusing, remember that Mallet ignores very common words by default, so e.g. pretending_wise was probably “pretending to be wise” in the raw text).

Still, it seems like 20 topics with 1- and 2-grams is best. Let’s generate that kind of a classification again, and this time also have the classifier tell us what percentage of each chapter is made up by a given topic.

Here are the topics:

0 0,03474 moody eye monroe mad_eye mad voldemort amelia prophecy bones david amelia_bones albus david_monroe minerva eye_moody alastor line azkaban voldie
1 0,02881 draco father harry_potter blood dr draco_voice magic test muggles powerful paper wizards draco_harry fading scientist spells harry_draco magic_fading scientists
2 0,02842 miss_granger miss hermione hero heroes granger hermione_granger elder_wand elder humming sinistra hermione_voice cell mysterious_wizard professor_sinistra fingernails vow sparkling professor_vector
3 0,02272 hat sorting neville sorting_hat goyle note ha points slytherins remembrall game mr_goyle paper ha_ha comed ernie comed_tea defence rock
4 0,05541 quirrell professor_quirrell professor mr_potter mr lose quirrell_voice goyle mr_goyle lesson quirrell_harry quirrell_face potter_professor secrets monster quirrell_nodded quirrell_points derrick quirrell_looked
5 0,02989 father dad mum books ron verres evans science petunia parents platform verres_evans michael trunk scarf letter train son owl
6 0,03465 malfoy lucius lucius_malfoy lord_malfoy wizengamot debt son house_malfoy house_potter false longbottom colored podium thousand_galleons plum_colored plum false_memory law owed
7 0,03928 daphne susan tracey hannah snape lavender bullies bully professor_snape draco_malfoy greengrass millicent bones sprout parvati corridor susan_bones girl davis
8 0,03814 hagrid troll forest unicorn tracey mr_hagrid centaur yeh tick filch weasley broomstick rubeus forbidden_forest huge forbidden twins unicorns argus
9 0,03042 draco neville soldiers general sunshine chaos army dragon granger zabini battle malfoy armies doom_doom doom dragons forest dragon_army shield
10 0,03238 voldemort lord lord_voldemort mirror dark_lord stone iss altar tom horcrux riddle parseltongue hissed wand perenelle tom_riddle dark body gun
11 0,04766 fred george neville fred_george lesath skeeter weasley rita severus twins rita_skeeter lestrange weasley_twins lesath_lestrange gryffindors handsome legilimens flume occlumency
12 0,05025 padma girls girl patil pettigrew padma_patil table responsible rival astorga pansy granger rumor ravenclaw_table heroine rat morning madam_pomfrey year_witch
13 0,03567 bellatrix snake dementors azkaban amelia patronus broomstick professor_quirrell bahry metal quirrell auror charm woman iss hissed corridor aurors bellatrix_black
14 0,0177 phoenix fawkes war blaise aftermath millicent envelope azkaban moody black_mist mist zabini haukelid wizard_voice back_sleep tower gregory million violence
15 0,04214 dementor patronus lupin headmaster remus patronus_charm mr_lupin james lily cast_patronus godric cage corporeal patronuses fear death happy anthony chocolate
16 0,03449 severus minerva albus potions_master potions master snape severus_snape time_turner turner headmistress floo azkaban professor_snape discipline severus_voice points_ravenclaw escape headmaster_office
17 0,04336 mcgonagall professor_mcgonagall galleons gold alley shop bag mr_potter pouch diagon_alley coins diagon wizarding_world malkin witch vault wizarding street kit
18 1,37162 harry professor potter hermione voice time back dumbledore quirrell professor_quirrell mr don thought boy dark hogwarts eyes face lord
19 0,02203 transfiguration transfigure eraser atoms minerva page ball harry_hermione separate sentient hermione_voice library subject diamond collection snakes pig free_transfiguration research

To make things easier, I’m going to give each of those topics a more descriptive name. I went with these:

0: Mad-Eye Moody & David Monroe
1: Harry & Draco doing science together
2: Hermione
3: Sorting Hat & Mr. Goyle
4: Professor Quirrell
5: Harry’s parents
6: Lucius Malfoy & Harry’s debt
8: Hagrid & the Forest
9: Quirrel Armies
10: Lord Voldemort
11: Fred & George
12: Padma Patil and stuff
13: Azkaban Arc
14: Random
15: Dementors & Patronouses
16: Albus, Minerva, and Snape
17: Diagon Alley & Money
18: Generic (this topic makes up by far the largest proportion of the story: it has a weight of 1,37 whereas none of the others reach even 0,06. You could call it the “whatever doesn’t fit into one of the other topics” topic)
19: Transfiguration

That’s not too bad of a list of topics in HPMOR, though the proportion of the “generic” topic is kinda annoying. Here are some of the topic classifications the model gives us (only the largest percentages shown):

Chapter 1, A Day of Very Low Probability: 57,8% Harry’s Parents, 42,1% Generic
Chapter 2, Everything I Believe Is False: 47,5% Generic, 26,2% Diagon Alley & Money, 25,2% Harry’s Parents
Chapter 3, Comparing Reality To Its Alternatives: 49,6% Generic, 39,6% Diagon Alley & Money, 7% Harry’s Parents
Chapter 4, The Efficient Market Hypothesis: 59,7% Diagon Alley & Money, 40% Generic
Chapter 5, The Fundamental Attribution Error: 49,8% Diagon Alley & Money, 49,4% Generic
Chapter 6, The Planning Fallacy: 50,5% Generic, 47,9% Diagon Alley & Money

These topic classifications initially go roughly as one might expect, though the topic we termed “Diagon Alley & Money” shows up as early as in Chapter 2, and they only got to the Alley in Chapter 3.

Chapter 7, Reciprocation: 46,1% Generic, 42,0% Harry’s Parents, 10,2% Harry & Draco doing science together

After that it stays strong until Chapter 7 where it disappears entirely as the story moves away from the Alley to the King’s Cross Station, Harry’s parents say him goodbye, and Harry runs into Draco among others.

Chapter 8, Positive Bias: 52,4% Generic, 38,5% Harry’s Parents, 4,4% Sorting Hat & Mr Goyle (1.3432768379668802E-5 Hermione)

But then there’s Chapter 8, where Harry and Hermione have an extended discussion: besides Generic, this is classified as mostly being about Harry’s Parents (???), and a little bit about the weirdball “Sorting Hat & Mr. Goyle”; the topic we had named “Hermione” comes at a very low fraction.

Chapter 9, Title Redacted, Part I: 50,3% Generic, 41,6% Sorting Hat & Mr. Goyle, 8,02% Fred & George

Chapter 9 is where people are sorted (and Fred & George make a minor appearance). It’s interesting to notice that chapter 8 had a bit of Sorting Hat content, even though nothing about the sorting was mentioned: we also previously saw that the Diagon Alley classification showed up even before they went to Diagon alley.

But now I need to leave work, so no time to do more analysis at this point. If anyone wants to do more analysis, the full results are here: http://pastebin.com/bGip7X4D

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

Wednesday, April 29th, 2015
10:17 am - Teaching economics & ethics with Kitty Powers’ Matchmaker

Unusual ways to teach economics. I’m currently playing Kitty Powers’ Matchmaker, a silly but fun little game in which you run a dating agency and try to get your clients on successful dates and, eventually, into a successful relationship.

Now one way of playing this would be to just prioritize the benefit of each client, trying to get them in maximally satisfying relationships as fast as possible. But while I sometimes do that, often I do things differently.

In one case, I had a client who’d been on two bad dates already, and was threatening to march out and give my company a bad reputation if she’d have one more bad date. I didn’t have any good matches lined up for her. I could have just kicked her out, but that wouldn’t have given my company any money. So instead I put her on a date with someone who seemed incompatible, but just had her lie about all the incompatibilities and say what the other person wanted to hear. That way, they’d end up together, and I’d get my money and be rid of the troublesome client. Of course I knew that they’d break up later and that would hurt my reputation a bit, but I figured that it would still be better for the company than kicking her out now.

(In my defense, I have only done this once, and I felt kinda bad about it.)

This situation is known in economics as the principal-agent problem: a situation where someone (the “principal”) hires someone else (the “agent”) to do something on the principal’s behalf, but the self-interests of the principal and the agent differ. So for example, you may try to get a real estate agent to sell your house and give them a cut of the profit. It would be in your interest if the agent sold it for as high a price as possible, but the agent may actually benefit more if they spend less time on each individual sale and instead sell a lot of houses more cheaply, but in a shorter time. This was confirmed in a study in which it was found that real estate agents tended to sell other people’s houses considerably faster and cheaper than they sold their own houses.

Or, you might go to a matchmaking agency to get into the relationship of your dreams, but your matchmaker also has an interest in getting your money and benefiting the company.

Here’s another thing that I do in the game that some might consider questionable. When a client comes in, they will tell me their personality traits, e.g. introvert vs. extrovert. It’s best to pair them off with someone who has the same personality traits. But when the game shows me a list of people I can try to match my client with, by default I don’t know the personality traits of those people. Instead, I have to have some client date those people and discover their personality traits, and then I too will learn them.

Now suppose that a new client comes in, and I know of someone I could have them date who’d be perfectly compatible. I also have a bunch of other possibilities, whose personality traits I don’t know. Do I send my client on the best possible date right away? Of course not! Instead, I’ll send them on a few dates with the unknowns, so that I can discover the personality traits of the unknowns, and only after a few bad dates will I pair my client with the best match. This way, I’ll know the personality traits of as many people as possible, and will always be able to know of a compatible match for my next client.

Is this ethical? You could argue either way. Yes: I’m still sending my client to a good relationship eventually, and although it might give my client a few bad dates in the beginning, that helps other clients eventually get a good date. No: I have an obligation to prioritize the interest of my current client at all times, and it’s not in their interest to have a bad time. The first argument has a bit of a consequentialist vibe, and the second one has a bit of a deontologist vibe. If you were teaching an introductory ethics course and wanted to give your students a different example than the usual ones, maybe you could have them play the game and then ask them this question.

Comedy dating sims: useful for teaching both economics and ethics.

Originally published at Kaj Sotala. You can comment here or there.

(Leave an echo)

> previous 20 entries
> top of page