Or connect using:
 A view to the gallery of my mind > recent entries > calendar > friends > Website > profile > previous 20 entries
Tuesday, November 29th, 2016
11:30 am - Suddenly, a taste of freedom
 So a few days back, I mentioned that after getting rid of my subconscious idealized assumptions of what a relationship “should” be like, I stopped being so desperate to be in a relationship. And some time before that, I mentioned that I’d decided to put the whole “saving the world” thing on hold for a few years and focus on taking care of myself first. As a result, I’ve suddenly found myself having *no* pressing goals that would direct my life. No stress about needing to do something big-impact. No constant loneliness and thinking about how to best impress people. Just a sudden freedom to do basically anything. I’m still in the process of disassembling various mental habits that were focused on making me more single-mindedly focused on the twin goals of saving the world and getting into a relationship. But starting to suspect that even more things were defined by those goals than I suspected. For instance, my self-esteem has usually been pretty bad, probably because I was judging myself and my worth pretty much entirely by how well I did at those two goals. And I didn’t feel like I was doing particularly well at either. Now I can just… Live a day at a time and not sweat it. It’s going to take a while to get used to this. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Monday, November 28th, 2016
10:08 am - Finding slices of joy
 Three weeks ago, I ran across an article called “Google’s former happiness guru developed a three-second brain exercise for finding joy“. Yes, the title is kinda cringe-worthy, but the content is good. Here are the most essential five paragraphs: Successfully reshaping your mindset, [Chade-Meng Tan] argues, has less to do with hours of therapy and more to do with mental exercises, including one that helps you recognize “thin slices of joy.” “Right now, I’m a little thirsty, so I will drink a bit of water. And when I do that, I experience a thin slice of joy both in space and time,” he told CBC News. “It’s not like ‘Yay!”” he notes in Joy on Demand. “It’s like, ‘Oh, it’s kind of nice.’” Usually these events are unremarkable: a bite of food, the sensation of stepping from a hot room to an air-conditioned room, the moment of connection in receiving a text from an old friend. Although they last two or three seconds, the moments add up, and the more you notice joy, the more you will experience joy, Tan argues. “Thin slices of joy occur in life everywhere… and once you start noticing it, something happens, you find it’s always there. Joy becomes something you can count on.” That’s because you’re familiarizing the mind with joy, he explains. Tan bases this idea on neurological research about how we form habits. Habitual behaviors are controlled by the basal ganglia region of the brain, which also plays a role in the the development of memories and emotions. The better we become at something, the easier it becomes to repeat that behavior without much cognitive effort. Tan’s “thin slice” exercise contains a trigger, a routine, and a reward—the three parts necessary to build a habit. The trigger, he says, is the pleasant moment, the routine is the noticing of it, and the reward is the feeling of joy itself. Since then, I have been working on implementing its advice, and making it a habit to notice the various “thin slices of joy” in my life. It was difficult to remember at first, and on occasions when I’m upset for any reason it’s even harder to follow, even if I do remember it. Still, it is gradually becoming a more entrenched habit, with me remembering it and automatically following it more and more often – and feeling better as a result. I’m getting better at noticing the pleasure in sensations like Drinking water. Eating food. Going to the bathroom. Having drops of water fall on my body while in the shower. The physicality of brushing teeth, and the clean feeling in the mouth that follows. Being in the same room as someone and feeling less alone, even if both are doing their own things. Typing on a keyboard and being skilled enough at it to have each finger just magically find the right key without needing to look. And so on. Most of these are physical sensations. I would imagine that this would be a lot harder for someone who doesn’t feel comfortable in their body. But for me, a great thing about this is that my body is always with me. Anytime when I’m sitting comfortably – or standing, or lying, or walking comfortably – I can focus my attention on that comfort and get that little bit of joy. In the article, it said that “Thin slices of joy occur in life everywhere… and once you start noticing it, something happens, you find it’s always there. Joy becomes something you can count on.” That’s because you’re familiarizing the mind with joy, he explains. I feel like this is starting to happen to me. Still not reliably, still not always, still easily broken by various emotional upsets. But I still feel like I’m making definite progress. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Saturday, November 26th, 2016
12:41 pm - Relationship realizations
 Learning experiences: just broke up with someone recently. Part of the problem was that I had some very strong, specific and idealized expectations of what a relationship “should” be like – expectations which caused a lot of trouble, but which I hadn’t really consciously realized that I had, until now. Digging up the expectations and beating them into mush with a baseball bat came too late to save this particular relationship, but it seems to have had an unexpected side effect: the thought of being single feels a lot less bad now. I guess that while I had that idealized vision of “being in a relationship”, my mind was constantly comparing singledom to that vision, finding my current existence to be lacking, and feeling bad as a result. But now that I’ve gone from “being in a relationship means X” to “being in a relationship can mean pretty much anything, depending on the people involved”, there isn’t any single vision to compare my current state against. And with nothing to compare against, there’s also nothing that would make me feel unhappy because I don’t have it currently. Huh. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Friday, September 23rd, 2016
11:54 am - Software for Moral Enhancement
Tuesday, August 16th, 2016
2:21 pm - An appreciation of the Less Wrong Sequences
Saturday, June 11th, 2016
6:11 pm - Error in Armstrong and Sotala 2012
 Katja Grace has analyzed my and Stuart Armstrong’s 2012 paper “How We’re Predicting AI – or Failing To”. She discovered that one of the conclusions, “predictions made by AI experts were indistinguishable from those of non-experts”, is flawed due to “a spreadsheet construction and interpretation error”. In other words, I coded the data in one way, there was a communication error and a misunderstanding about what the data meant, and as a result of that, a flawed conclusion slipped into the paper. I’m naturally embarrassed that this happened. But the reason why Katja spotted this error was that we’d made our data freely available, allowing her to spot the discrepancy. This is why data sharing is something that science needs more of. Mistakes happen to everyone, and transparency is the only way to have a chance of spotting those mistakes. I regret the fact that we screwed up this bit, but proud over the fact that we did share our data and allowed someone to catch it. EDITED TO ADD: Some people have taken this mistake to suggest that the overall conclusion, that AI experts are not good predictors of AI timelines, to be flawed. That would overstate the significance of this mistake. While one of the lines of evidence supporting this overall conclusion was flawed, several others are unaffected by this error. Namely, the fact that expert predictions disagree widely with each other, that many past predictions have turned out to be false, and that the psychological literature on what’s required for the development of expertise suggests that it should be very hard to develop expertise in this domain. (see the original paper for details) (I’ve added a note of this mistake to my list of papers.) Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Saturday, May 14th, 2016
11:39 am - Smile, You Are On Tumblr.Com
 I made a new tumblr blog. It has photos of smiling people! With more to come! Why? Previously I happened to need pictures of smiles for a personal project. After going through an archive of photos for a while, I realized that looking at all the happy people made me feel really happy and good. So I thought that I might make a habit out of looking at photos of smiling people, and sharing them. Follow for a regular extra dose of happiness! Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Wednesday, April 27th, 2016
9:52 am - Decisive Strategic Advantage without a Hard Takeoff (part 1)
Friday, April 22nd, 2016
6:07 am - Simplifying the environment: a new convergent instrumental goal
Friday, April 15th, 2016
11:48 am - AI risk model: single or multiple AIs?
Tuesday, April 5th, 2016
10:59 am - Disjunctive AI risk scenarios: AIs gaining the power to act autonomously
Monday, April 4th, 2016
12:59 pm - Disjunctive AI risk scenarios: AIs gaining a decisive advantage
Monday, February 8th, 2016
11:03 am - Reality is broken, or, an XCOM2 review
Wednesday, December 16th, 2015
10:10 am - Me and Star Wars
Saturday, November 28th, 2015
6:26 pm - Desiderata for a model of human values
 Soares (2015) defines the value learning problem as By what methods could an intelligent machine be constructed to reliably learn what to value and to act as its operators intended? There have been a few attempts to formalize this question. Dewey (2011) started from the notion of building an AI that maximized a given utility function, and then moved on to suggest that a value learner should exhibit uncertainty over utility functions and then take “the action with the highest expected value, calculated by a weighted average over the agent’s pool of possible utility functions.” This is a reasonable starting point, but a very general one: in particular, it gives us no criteria by which we or the AI could judge the correctness of a utility function which it is considering. To improve on Dewey’s definition, we would need to get a clearer idea of just what we mean by human values. In this post, I don’t yet want to offer any preliminary definition: rather, I’d like to ask what properties we’d like a definition of human values to have. Once we have a set of such criteria, we can use them as a guideline to evaluate various offered definitions. By “human values”, I here basically mean the values of any given individual: we are not talking about the values of, say, a whole culture, but rather just one person within that culture. While the problem of aggregating or combining the values of many different individuals is also an important one, we should probably start from the point where we can understand the values of just a single person, and then use that understanding to figure out what to do with conflicting values. In order to make the purpose of this exercise as clear as possible, let’s start with the most important desideratum, of which all the others are arguably special cases of: 1. Useful for AI safety engineering. Our model needs to be useful for the purpose of building AIs that are aligned with human interests, such as by making it possible for an AI to evaluate whether its model of human values is correct, and by allowing human engineers to evaluate whether a proposed AI design would be likely to further human values. In the context of AI safety engineering, the main model for human values that gets mentioned is that of utility functions. The one problem with utility functions that everyone always brings up, is that humans have been shown not to have consistent utility functions. This suggests two new desiderata: 2. Psychologically realistic. The proposed model should be compatible with that which we know about current human values, and not make predictions about human behavior which can be shown to be empirically false. 3. Testable. The proposed model should be specific enough to make clear predictions, which can then be tested. As additional requirements related to the above ones, we may wish to add: 4. Functional. The proposed model should be able to explain what the functional role of “values” is: how do they affect and drive our behavior? The model should be specific enough to allow us to construct computational simulations of agents with a similar value system, and see whether those agents behave as expected within some simulated environment. 5. Integrated with existing theories. The proposed definition model should, to as large an extent possible, fit together with existing knowledge from related fields such as moral psychology, evolutionary psychology, neuroscience, sociology, artificial intelligence, behavioral economics, and so on. However, I would argue that as a model of human value, utility functions also have other clear flaws. They do not clearly satisfy these desiderata: 6. Suited for modeling internal conflicts and higher-order desires. A drug addict may desire a drug, while also desiring that he not desire it. More generally, people may be genuinely conflicted between different values, endorsing contradictory sets of them given different situations or thought experiments, and they may struggle to behave in a way in which they would like to behave. The proposed model should be capable of modeling these conflicts, as well as the way that people resolve them. 7. Suited for modeling changing and evolving values. A utility function is implicitly static: once it has been defined, it does not change. In contrast, human values are constantly evolving. The proposed model should be able to incorporate this, as well as to predict how our values would change given some specific outcomes. Among other benefits, an AI whose model of human values had this property might be able to predict things that our future selves would regret doing (even if our current values approved of those things), and warn us about this possibility in advance. 8. Suited for generalizing from our existing values to new ones. Technological and social change often cause new dilemmas, for which our existing values may not provide a clear answer. As a historical example (Lessig 2004), American law traditionally held that a landowner did not only control his land but also everything above it, to “an indefinite extent, upwards”. Upon the invention of this airplane, this raised the question – could landowners forbid airplanes from flying over their land, or was the ownership of the land limited to some specific height, above which the landowners had no control? In answer to this question, the concept of landownership was redefined to only extend a limited, and not an indefinite, amount upwards. Intuitively, one might think that this decision was made because the redefined concept did not substantially weaken the position of landowners, while allowing for entirely new possibilities for travel. Our model of value should be capable of figuring out such compromises, rather than treating values such as landownership as black boxes, with no understanding of why people value them. As an example of using the current criteria, let’s try applying them to the only paper that I know of that has tried to propose a model of human values in an AI safety engineering context: Sezener (2015). This paper takes an inverse reinforcement learning approach, modeling a human as an agent that interacts with its environment in order to maximize a sum of rewards. It then proposes a value learning design where the value learner is an agent that uses Solomonoff’s universal prior in order to find the program generating the rewards, based on the human’s actions. Basically, a human’s values are equivalent to a human’s reward function. Let’s see to what extent this proposal meets our criteria. Useful for AI safety engineering. To the extent that the proposed model is correct, it would clearly be useful. Sezener provides an equation that could be used to obtain the probability of any given program being the true reward generating program. This could then be plugged directly into a value learning agent similar to the ones outlined in Dewey (2011), to estimate the probability of its models of human values being true. That said, the equation is incomputable, but it could be possible to construct computable approximations. Psychologically realistic. Sezener assumes the existence of a single, distinct reward process, and suggests that this is a “reasonable assumption from a neuroscientific point of view because all reward signals are generated by brain areas such as the striatum”. On the face of it, this seems like an oversimplification, particularly given evidence suggesting the existence of multiple valuation systems in the brain. On the other hand, since the reward process is allowed to be arbitrarily complex, it could be taken to represent just the final output of the combination of those valuation systems. Testable. The proposed model currently seems to be too general to be accurately tested. It would need to be made more specific. Functional. This is arguable, but I would claim that the model does not provide much of a functional account of values: they are hidden within the reward function, which is basically treated as a black box that takes in observations and outputs rewards. While a value learner implementing this model could develop various models of that reward function, and those models could include internal machinery that explained why the reward function output various rewards at different times, the model itself does not make any assumptions of this. Integrated with existing theories. Various existing theories could in principle used to flesh out the internals of the reward function, but currently no such integration is present. Suited for modeling internal conflicts and higher-order desires. No specific mention of this is made in the paper. The assumption of a single reward function that assigns a single reward for every possible observation seems to implicitly exclude the notion of internal conflicts, with the agent always just maximizing a total sum of rewards and being internally united in that goal. Suited for modeling changing and evolving values. As written, the model seems to consider the reward function as essentially unchanging: “our problem reduces to finding the most probable $p_R$ given the entire action-observation history $a_1o_1a_2o_2 . . . a_no_n$.” Suited for generalizing from our existing values to new ones. There does not seem to be any obvious possibility for this in the model. I should note that despite its shortcomings, Sezener’s model seems like a nice step forward: like I said, it’s the only proposal that I know of so far that has even tried to answer this question. I hope that my criteria would be useful in spurring the development of the model further. As it happens, I have a preliminary suggestion for a model of human values which I believe has the potential to fulfill all of the criteria that I have outlined. However, I am far from certain that I have managed to find all the necessary criteria. Thus, I would welcome feedback, particularly including proposed changes or additions to these criteria. Originally published at Kaj Sotala. You can comment here or there. (Leave an echo)
Thursday, November 12th, 2015
10:42 am - Learning from painful experiences