top of page

Unaligned AI

Updated: Jun 28, 2022

By Mehar Bhasin

Speaking at the South by Southwest tech conference in Austin on March 11th 2018, Tesla and SpaceX boss Elon Musk shared his dire warnings about the danger of artificial intelligence: “‘Mark my words — A.I. is far more dangerous than nukes’” Musk, though, is far from alone in sharing doomsday AI scenarios. The late physicist Stephen Hawking was similarly forthright when he mentioned that AI’s impact could be cataclysmic unless its rapid development is ethically controlled. He warned that the emergence of AI could be the "worst event in the history of our civilization" unless society finds a way to control its development.

Toby Ord, Senior Research Fellow in Philosophy at Oxford University. estimates a one in ten likeliness that unaligned AI will cause human extinction or permanent and drastic curtailment of humanity's potential in the next hundred years. One fundamental problem with both current and future AI systems is that of the alignment problem. The idea is that an artificial intelligence designed with the proper moral system wouldn’t act in a way that is detrimental to human beings in the first place.

Besides them, Steve Wozniak, Bill Gates, and many leading researchers in technology have recently expressed concern in the media about the risks posed by AI. Why is the subject suddenly in the headlines?

Existential risk

Existential risk from artificial general intelligence is the hypothesis that substantial progress in artificial general intelligence (AGI) could result in human extinction or some other unrecoverable global catastrophe. It is argued that the human species currently dominates other species because the human brain has some distinctive capabilities that other animals lack. If AI surpasses humanity in general intelligence and becomes "superintelligent", then it could become difficult or impossible for humans to control. Just as the fate of the mountain gorilla depends on human goodwill, so might the fate of humanity depend on the actions of a future machine superintelligence.

Because AI has the potential to become more intelligent than any human, we have no sure-fire way of predicting how it will behave. We can’t use past technological developments as much of a basis because we’ve never created anything that can, wittingly or unwittingly, outsmart us. The best example of what we could face may be our own evolution. People now control the planet, not because we’re the strongest, fastest or biggest, but because we’re the smartest. If we’re no longer the smartest, are we assured to remain in control?

Alignment With Human Values

Major approaches to the control problem include alignment, which aims to align AI goal systems with human values, and capability control, which aims to reduce an AI system's capacity to harm humans or gain control. One source of concern is that controlling a superintelligent machine, or instilling it with human-compatible values, may be a harder problem than naïvely supposed. Any mismatch between a superintelligent machine’s goals and humanity’s goals broadly speaking is potentially catastrophic.

Example of paperclip apocalypse

The AI led apocalypse notion arises from a thought experiment by Nick Bostrom, a philosopher at the University of Oxford. Bostrom was examining the 'control problem': how can humans control a super-intelligent AI even when the AI is orders of magnitude smarter. Bostrom's thought experiment goes like this: suppose that someone programs and switches on an AI that has the goal of producing paperclips. The AI is given the ability to learn, so that it can invent ways to achieve its goal better. As the AI is super-intelligent, if there is a way of turning something into paperclips, it will find it. It will want to secure resources for that purpose. The AI is single-minded and more ingenious than any person, so it will appropriate resources from all other activities. Soon, the world will be inundated with paperclips.

It gets worse. We might want to stop this AI. But it is single-minded and would realise that this would subvert its goal. Consequently, the AI would become focussed on its own survival. It is fighting humans for resources, but now it will want to fight humans because they are a threat (think The Terminator). This AI is much smarter than us, so it is likely to win that battle. We have a situation in which an engineer has switched on an AI for a simple task but, because the AI expanded its capabilities through its capacity for self-improvement, it has innovated to better produce paperclips, and developed power to appropriate the resources it needs, and ultimately to preserve its own existence.

Bostrom argues that it would be difficult to control a super-intelligent AI – in essence, better intelligence beats weaker intelligence. Tweaks to the AI’s motivation may not help. For instance, you might ask the AI to produce only a set number of paperclips, but the AI may become concerned we might use them up, and still attempt to eliminate threats. It is hard to program clear preferences, as economists well know.

The conclusion is that we exist on a knife-edge. Turning on such an AI might be the last thing we do. The paperclip maximizer is the canonical thought experiment showing how an artificial general intelligence, even one designed competently and without malice, could ultimately destroy humanity. The thought experiment shows that AIs with apparently innocuous values could pose an existential threat.

What Values Do We Want?

Understanding what “values” we want is among the biggest challenges facing AI researchers.

“The issue, of course, is to define what exactly these values are, because people might have different cultures, different parts of the world, different socioeconomic backgrounds — I think people will have very different opinions on what those values are. And so that’s really the challenge,” says Stefano Ermon, an assistant professor at Stanford.

Roman Yampolskiy, an associate professor at the University of Louisville agrees. He explains, “It is very difficult to encode human values in a programming language, but the problem is made more difficult by the fact that we as humanity do not agree on common values, and even parts we do agree on change with time.”


Many believe the only way to prevent or at least temper the most malicious AI from wreaking havoc is some sort of regulation. Renowned futurist Martin Ford Ford agrees — with a caveat. Regulation of AI implementation is fine, he said, but not of the research itself.

“You regulate the way AI is used,” he said, “but you don’t hold back progress in basic technology. I think that would be wrong-headed and potentially dangerous.” Because any country that lags in AI development is at a distinct disadvantage — militarily, socially and economically. The solution, Ford continued, is selective application: “We decide where we want AI and where we don’t; where it’s acceptable and where it’s not. And different countries are going to make different choices. So, China might have it everywhere, but that doesn’t mean we can afford to fall behind them in the state-of-the-art.”

Figuring out how to effectively regulate AI will be challenging task for the fundamental problem is that many AI computations are not “explainable”. The algorithm makes decisions, but we don’t know why it made a particular decision. This lack of transparency makes regulating AI exponentially harder than regulating the more explainable and auditable technology that often-informed decision-making in the last century.

What comes next?

Advances in artificial intelligence show how far we’ve come toward the goal of creating thinking machines. But the challenges of artificial intelligence and the alignment problem also remind us of how much more we have to learn before we can create human-level intelligence.

AI scientists and researchers are exploring several different ways to overcome these hurdles and create AI systems that can benefit humanity without causing harm. Until then, we’ll have to tread carefully and beware of how much credit we assign to systems that mimic human intelligence on the surface.


Conne, Ariel “How Do We Align Artificial Intelligence with Human Values?”, Future of Life Institute, 2017.

Dickson, Ben “Understanding the AI alignment problem”, TechTalks, 2021

Gans, Joshua “AI and the paperclip problem”, VoxEU, 2018.

Gonzalez, Christian "The Most Important Question in AI Alignment”, Montreal AI Ethics Instirute, 2019

LessWrong Discussion “Paperclip Maximizer”, LessWrong, 2020.

Stewart, Duncan, and others “AIs wide shut: AI regulation gets (even more) serious”, Deloitte Insights, 2021.

Thomas, Mile “7 Dangerous Risks of Artificial Intelligence”, Built In, 2021.

453 views0 comments
bottom of page