What is past, and passing, and to come?
I've realised lately that I haven't posted much on my blog this year. Funnily enough, this coincides with 2020 being my most productive year so far. So in addition to belatedly putting up a few cross-posts from elsewhere, I thought it'd be useful to share here some of the bigger projects I've been working on which haven't featured elsewhere on this blog.
The most important is AGI safety from first principles (also available here as a PDF), my attempt to put together the most compelling case for why the development of artificial general intelligence might pose an existential threat to humanity. It's long (about 15,000 words) but I've tried to make it as accessible as possible to people without a machine learning background, because I think the topic is so critically important, and because there's an appalling lack of clear explanations of what might go wrong and why. Early work by Bostrom and Yudkowsky is less relevant in the context of modern machine learning; more recent work is scattered and brief. I originally intended to just summarise other people's arguments, but as the report grew, it became more representative of my own views and less representative of anyone else's. So while it covers the standard ideas, I also think that it provides a new perspective on how to think about AGI - one which doesn't take any previous claims for granted, but attempts to work them out from first principles.
A second big piece of work is Thiel on progress and stagnation, a 100-page compendium of quotes from Peter Thiel on - you guessed it - progress and stagnation in technology, and in society more generally. This was a joint project with Jeremy Nixon. We both find Thiel's views to be exciting and thought-provoking - but apart from his two books (which focused on different topics) they'd previously only been found scattered across the internet. Our goal was to select and arrange quotes from him to form a clear, compelling and readable presentation of his views. You can judge for yourself if we succeeded - although if you're pressed for time, there's a summary here.
Thirdly, I've put together the Effective Altruism archives reading list. This collates a lot of material from across the internet written by EAs on a range of relevant topics, much of which is otherwise difficult to find (especially older posts). The reading list is aimed at people who are familiar with EA but want to explore in more detail some of the ideas that have historically been influential within EA. These are often more niche or unusual than the material used to promote EA, and I don't endorse all of them - although I tried to only include high-quality content that I think is worth reading if you're interested in the corresponding topic.
Fourth is my first published paper, Avoiding Side Effects By Considering Future Tasks, which was accepted at NeurIPS 2020! Although note that my contributions were primarily on the engineering side; this is my coauthor Victoria's brainchild. From the abstract: Designing reward functions is difficult: the designer has to specify what to do (what it means to complete the task) as well as what not to do (side effects that should be avoided while completing the task). To alleviate the burden on the reward designer, we propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects. This auxiliary objective rewards the ability to complete possible future tasks, which decreases if the agent causes side effects during the current task. ... Using gridworld environments that test for side effects and interference, we show that our method avoids interference and is more effective for avoiding side effects than the common approach of penalizing irreversible actions.
Fifth, a series of posts on AI safety, exploring safety problems and solutions applicable to agents trained in open-ended environments, particularly multi-agent ones. Unlike most safety techniques, these don't rely on precise specifications - instead they involve "shaping" our agents to think in safer ways, and have safer motivations. Note that this is primarily speculative brainstorming; I'm not confident in any of them, although I'd be excited to see further exploration along these lines.
More generally, I've been posting a range of AI safety content on the Alignment Forum; I'm particularly happy about these three posts. And I've been asking questions I'm curious about on Less Wrong and the Effective Altruism Forum. Lastly, I've been very active on Twitter over the past couple of years; I haven't yet gotten around to collating my best tweets, but will do so eventually (and post them on this blog).
So that's what I've been up to so far this year. What's now brewing? I'm currently drafting my first piece of work for my PhD, on the links between biological fitness-maximisation and optimisation in machine learning. A second task is to revise the essay on Tinbergen's levels of explanation which I wrote for my Cambridge application - I think there are some important insights in there, but it needs a lot of work. I'm also writing a post tentatively entitled A philosopher's apology, explaining why I decided to get a PhD, what works very well about academia and academic philosophy, what's totally broken, and how I'm going to avoid (or fix) those problems. Lastly, I'm ruminating over some of the ideas discussed here, with the goal of (very slowly) producing a really comprehensive exploration of them. Thoughts or comments on any of these very welcome!
Zooming out, this year has featured what was probably the biggest shift of my life so far: the switch from my technical career as an engineer and AI researcher, to becoming a philosopher and general thinker-about-things. Of course this was a little butterfly-inducing at times. But increasingly I believe that what the world is missing most is novel and powerful ideas, so I'm really excited about being in a position where I can focus on producing them. So far I only have rough stories about how that happens, and what it looks like to make a big difference as a public intellectual - I hope to refine these over time to be able to really leverage my energies. Then onwards, and upwards!