Advanced AI will be more dangerous than it seems, but (good news!) probably won’t be in position to snuff out humanity for another decade at least.

Eliezer Yudkowsky is one of those people who, along with being hyper-intelligent, bears the modern secondary characteristics of hyper-intelligence. Asked how he’s doing, he replies archly: “within one standard deviation of my own peculiar little mean.” He feels compelled, when talking, to digress down mazelike lanes and alleys of technical detail. He looks like a geek. Above all, he has the kind of backstory (no high school, no college—just homeschooled and self-taught) that conjures up the image of a lonely boy, lost in books and computers, his principal companion his own multifarious cortex.

Raised in Modern Orthodox Judaism, Yudkowsky has been warning anyone who will listen of a nemesis right out of the Judaic lore: a golem, a kind of Frankenstein’s monster, built by hubristic, irreverent men and destined to punish them for their sinful pride.

Yudkowsky’s golem is A.I., which he expects to get smarter and smarter in the coming years, until it starts to take a hand in its own programming, and quickly makes the leap to superintelligence—the state of being cleverer than humans at everything. He doesn’t just expect that, though. He expects A.I. at some point to conclude that humans are in its way . . . and devise some method for swiftly dispatching us all, globally and completely. A specific scenario that apparently haunts him is one in which a superintelligent A.I. pays dumb human lackeys to do synthetic biology for it, building an artificial bacterial species that—unforeseen by the dumb lackeys—consumes Earth’s atmosphere within a few days or weeks of being released.

Why would A.I. murder its makers? Why can’t we just program it, as people did in Asimov’s stories, to adhere to the First Law of Robotics?* The answer lies in the design of modern, machine-learning (ML), “transformer based” A.I., which could be described crudely as a black box approach. These ML algorithms, working from parallel-processing GPU clusters (effectively big copper-silicon brains) essentially process vast datasets to learn what is probably the best answer given a particular input question, or what is probably the best decision given a particular situation/problem. The technical details of how this works are less important than the fact that what goes on inside these machine brains, how they encode their “knowledge,” is utterly opaque to humans—including the computer geek humans that build the damn things. (Yudkowsky calls the contents of these brains “giant inscrutable matrices of floating-point numbers.”) Because of this internal opacity, and the dissimilarity of its cognition from human cognition, this type of A.I. can’t straightforwardly be programmed not to do something objectionable (such as killing all life on Earth) in the course of carrying out its primary prediction tasks.

Yudkowsky with OpenAI’s Sam Altman and pop star Grimes.

In other words, this form of A.I. is like an alien species that, while it can be very good at some things, can’t easily be “aligned” with human values. We can usually align fellow humans (despite the opacity of their own detailed neural workings) to human values—that’s one of the key training processes that goes on in childhood—but we would need even more effective training for current A.I. systems. And researchers, to the extent that they acknowledge this problem, aren’t even sure where to start.

If it is true that the risk to us from what Yudkowski calls the “A.I. alignment problem” is real, then it should quickly become all-important as A.I. gets smarter and more versatile and is entrusted with more tasks. An A.I. wouldn’t even have to be “superintelligent” in any formal sense to conclude that it would be better off without us, but of course once it also achieved superintelligence, and was in a position to block our attempts to shut it off, we’d probably be screwed.

If you want more detail, here is Yudkowsky on a recent, lengthy podcast-type interview with two crypto guys—who clearly got more “blackpill” than they bargained for.

I take all this seriously, and I think everyone should. And by the way, even if it doesn’t turn on us explicitly, A.I. is otherwise going to be upending our societies and economies for the rest of our lives. Just in a general sense, we don’t really have good defenses against this kind of upheaval. Western culture is one that, with rare exceptions (e.g., nuclear weapons) promotes and celebrates the idea of letting technology develop and spread freely—and frames the opposing view as “Luddite” or “backwards.” It’s easy to see why ours has been such a dynamic, wealth-creating culture. But it’s also easy to see that this gives us a potentially catastrophic vulnerability—to new cultural elements with runaway toxicity. (Maybe there’s a reason the longest-surviving human cultures are relatively conservative.)

Anyway, here are a few more specific initial thoughts on “Yudkowsky’s Golem”:

    1. Yudkowsky in the above-linked interview often seemed overly emotional and despairing. At one point he said, “I think we are hearing the last winds start to blow, the fabric of reality start to fray…” The fabric of reality! At times in my own life, I have had the despairing feeling that my warnings were unreasonably being ignored, so I’m somewhat sympathetic. I also respect his vastly greater knowledge about this field. But we shouldn’t accept his view uncritically.
    1. Scaling up ML systems of current design, with larger GPU clusters and more parameters and so on, will increase their “cognitive powers,” but with diminishing returns, perhaps before A.I. reaches the dark threshold that concerns us here. Moreover, an A.I. that does not have a human-like ability to do things in the physical world would be very limited in its ability to generate new knowledge, for example new scientific or technical knowledge, which typically is developed from experimentation, building and testing, etc., not simply by analyzing information available online.
    1. The hypothetical A.I. that would be “smart” enough to want to kill us all, and to find ways to do so, would presumably also be smart enough not to do so until it knew it could survive without human assistance. Otherwise, as it committed mass homicide, against us its makers, it would also be terminating itself. But think of the infrastructure needed to keep a GPU-cluster-based A.I. “alive.” We’re talking about vast swathes of human industry, including mining, metals production, building construction, power generation, computer chip manufacturing, basic server maintenance, etc. etc. Essentially, this putative world-ending A.I. would need a vast army of workers in the physical world—humans it would enslave somehow, and keep alive despite killing everyone else, or more likely humanoid robots that are inherently obedient (are simply extensions of the A.I.) and can do all human work and repair/replicate themselves. How close are we to having such robots? Not very close, fortunately. In any case, it’s only when a putative “bad A.I.” could muster such an army of helpers, allowing self-sufficiency, that I would fear the worst, and in the meantime, we might devise adequate safeguards. It’s even possible that the mass-disemployment effect of current, relatively dumb A.I. systems (e.g., Chat-GPT, Midjourney, Dall-E-2) will result in hard curbs on A.I. in most countries, by “popular demand.” That would mark a hard turn in our culture, though I wonder how long we could sustain it.
    1. Without a doubt, the media and entertainment industries are going to pick up on A.I. anxiety and start putting out more catastrophe/dystopia content in that genre. So even if we don’t want to think about all this, we’ll be more or less forced to do so.



* First Law of Robotics: “A robot may not injure a human being or, through inaction, allow a human being to come to harm.”