Feds appoint “AI doomer” to run AI safety at US institute

The US AI Safety Institute—part of the National Institute of Standards and Technology (NIST)—has finally announced its leadership team after much speculation.

Appointed as head of AI safety is Paul Christiano, a former OpenAI researcher who pioneered a foundational AI safety technique called reinforcement learning from human feedback (RLHF), but is also known for predicting that "there's a 50 percent chance AI development could end in 'doom.'" While Christiano's research background is impressive, some fear that by appointing a so-called "AI doomer," NIST may be risking encouraging non-scientific thinking that many critics view as sheer speculation.

There have been rumors that NIST staffers oppose the hiring. A controversial VentureBeat report last month cited two anonymous sources claiming that, seemingly because of Christiano's so-called "AI doomer" views, NIST staffers were "revolting." Some staff members and scientists allegedly threatened to resign, VentureBeat reported, fearing "that Christiano’s association" with effective altruism and "longtermism could compromise the institute’s objectivity and integrity."

NIST's mission is rooted in advancing science by working to "promote US innovation and industrial competitiveness by advancing measurement science, standards, and technology in ways that enhance economic security and improve our quality of life." Effective altruists believe in "using evidence and reason to figure out how to benefit others as much as possible” and longtermists that "we should be doing much more to protect future generations," both of which are more subjective and opinion-based.

On the Bankless podcast, Christiano shared his opinions last year that "there's something like a 10–20 percent chance of AI takeover" that results in humans dying, and "overall, maybe you're getting more up to a 50-50 chance of doom shortly after you have AI systems that are human level."

"The most likely way we die involves—not AI comes out of the blue and kills everyone—but involves we have deployed a lot of AI everywhere... [And] if for some reason, God forbid, all these AI systems were trying to kill us, they would definitely kill us,” Christiano said.

Critics of so-called "AI doomers" have warned that focusing on any potentially overblown talk of hypothetical killer AI systems or existential AI risks may stop humanity from focusing on current perceived harms from AI, including environmental, privacy, ethics, and bias issues. Emily Bender, a University of Washington professor of computation linguistics who has warned about AI doomers thwarting important ethical work in the field, told Ars that because "weird AI doomer discourse" was included in Joe Biden's AI executive order, "NIST has been directed to worry about these fantasy scenarios" and "that's the underlying problem" leading to Christiano's appointment.

"I think that NIST probably had the opportunity to take it a different direction," Bender told Ars. "And it's unfortunate that they didn't."

As head of AI safety, Christiano will seemingly have to monitor for current and potential risks. He will "design and conduct tests of frontier AI models, focusing on model evaluations for capabilities of national security concern," steer processes for evaluations, and implement "risk mitigations to enhance frontier model safety and security," the Department of Commerce's press release said.

Christiano has experience mitigating AI risks. He left OpenAI to found the Alignment Research Center (ARC), which the Commerce Department described as "a nonprofit research organization that seeks to align future machine learning systems with human interests by furthering theoretical research." Part of ARC's mission is to test if AI systems are evolving to manipulate or deceive humans, ARC's website said. ARC also conducts research to help AI systems scale "gracefully."

Because of Christiano's research background, some people think he is a good choice to helm the safety institute, such as Divyansh Kaushik, an associate director for emerging technologies and national security at the Federation of American Scientists. On X (formerly Twitter), Kaushik wrote that the safety institute is designed to mitigate chemical, biological, radiological, and nuclear risks from AI, and Christiano is “extremely qualified” for testing those AI models. Kaushik cautioned, however, that "if there’s truth to NIST scientists threatening to quit" over Christiano's appointment, "obviously that would be serious if true."

The Commerce Department does not comment on its staffing, so it's unclear if anyone actually resigned or plans to resign over Christiano's appointment. Since the announcement was made, Ars was not able to find any public announcements from NIST staffers suggesting that they might be considering stepping down.

In addition to Christiano, the safety institute's leadership team will include Mara Quintero Campbell, a Commerce Department official who led projects on COVID response and CHIPS Act implementation, as acting chief operating officer and chief of staff. Adam Russell, an expert focused on human-AI teaming, forecasting, and collective intelligence, will serve as chief vision officer. Rob Reich, a human-centered AI expert on leave from Stanford University, will be a senior advisor. And Mark Latonero, a former White House global AI policy expert who helped draft Biden's AI executive order, will be head of international engagement.

"To safeguard our global leadership on responsible AI and ensure we’re equipped to fulfill our mission to mitigate the risks of AI and harness its benefits, we need the top talent our nation has to offer," Gina Raimondo, US Secretary of Commerce, said in the press release. "That is precisely why we’ve selected these individuals, who are the best in their fields, to join the US AI Safety Institute executive leadership team."

VentureBeat's report claimed that Raimondo directly appointed Christiano.

Bender told Ars that there's no advantage to NIST including "doomsday scenarios" in its research on "how government and non-government agencies are using automation."

"The fundamental problem with the AI safety narrative is that it takes people out of the picture," Bender told Ars. "But the things we need to be worrying about are what people do with technology, not what technology autonomously does."

Christiano explained his views on AI doom

Ars could not immediately reach Christiano for comment, but he has explained his views on AI doom and responsible AI scaling.

In a blog posted on LessWrong, he explained that there were two distinctions that "often lead to confusion about" what he believes regarding AI doom.

The first distinction "is between dying ('extinction risk') and having a bad future ('existential risk')," clarifying that he thinks "there’s a good chance of bad futures without extinction, e.g., that AI systems take over but don’t kill everyone." One version of a "bad future" would be "an outcome where the world is governed by AI systems, and we weren’t able to build AI systems who share our values or care a lot about helping us," which Christiano said, "may not even be an objectively terrible future."

"But it does mean that humanity gave up control over its destiny, and I think in expectation it’s pretty bad," Christiano wrote.

The other distinction is "between dying now and dying later," Christiano said, clarifying that dying later may not exactly result "from AI," but from circumstances following AI advancement.

"I think that there’s a good chance that we don’t die from AI, but that AI and other technologies greatly accelerate the rate of change in the world and so something else kills us shortly later," Christiano wrote.

In that post, Christiano breaks down what he estimates are the probabilities of an AI takeover (22 percent), that "most" humans will die "within 10 years of building powerful AI" that makes labor obsolete (20 percent), and that "humanity has somehow irreversibly messed up our future within 10 years of building powerful AI" (46 percent).

He clarified that these probabilities are only intended "to quantify and communicate what I believe, not to claim I have some kind of calibrated model that spits out these numbers." He said these numbers are basically guesses that often change depending on new information that he receives.

"Only one of these guesses is even really related to my day job (the 15 percent probability that AI systems built by humans will take over)," Christiano wrote. "For the other questions I’m just a person who’s thought about it a bit in passing. I wouldn’t recommend deferring to the 15 percent, but definitely wouldn’t recommend deferring to anything else."

Timnit Gebru, who founded the Distributed Artificial Intelligence Research Institute after Google fired her from their AI ethical research team after she spoke out against discrimination, criticized Christiano's blog on X.

"What's better, that he wrote a blog on a cult forum, or that he just pulled random numbers out of his behind for this apocalyptic prediction?" Gebru wrote. "As they say, why not both."

In 2023, Christiano's nonprofit ARC helped test whether OpenAI's GPT-4 might take over the world and ultimately concluded that GPT-4 did not pose an existential risk because it was "ineffective" at "autonomous replication." Because ARC is concerned about AI systems manipulating humans, Christiano has commented on LessWrong that gain-of-function research becomes more important as AI systems become smarter. This suggests that his work at the safety institute evaluating systems will be a critical job.

"At this point it seems like we face a much larger risk from underestimating model capabilities and walking into danger than we do from causing an accident during evaluations," Christiano wrote. "If we manage risk carefully, I suspect we can make that ratio very extreme, though of course that requires us actually doing the work."

Christiano's take on pausing AI development

Christiano isn't the only one warning about AI's existential risks. In the past year, everyone from OpenAI executives to leaders of 28 countries has sounded alarms over potentially "catastrophic" AI harms. But critics like Meta Chief AI Scientist Yann LeCun have countered these warnings by claiming that the "whole debate around existential risk is wildly overblown and highly premature."

At the AI Safety Institute, Christiano will have the opportunity to mitigate actual AI risks at a time when people who build, test, and invest in AI have claimed that the speed of AI development is outpacing risk assessment. And if there's any truth to what Elon Musk says—which is hotly contested—AI will be "smarter than any one human probably around the end of next year."

To minimize surprises, Christiano's team will need to refine risk assessments, as he anticipates that models will get smarter and fine-tuning them will get riskier. Last October, on an effective altruism forum, Christiano wrote that regulations would be needed to keep AI companies in check.

"Sufficiently good responsible scaling policies (RSPs) could dramatically reduce risk" by "creating urgency around key protective measures and increasing the probability of a pause" in AI development "if those measures can’t be implemented quickly enough," Christiano explained.

Even with regulations around scaling, though, Christiano warned that "the risk from rapid AI development is very large, and that even very good RSPs would not completely eliminate that risk."

While some AI critics fearing existential risks have in the past year called for a temporary pause in AI frontier development until protective measures improve, Christiano has argued that only a unified global pause would come without significant costs.

Currently, Christiano has said that a pause isn't necessary because "the current level of risk is low enough that I think it is defensible for companies or countries to continue AI development if they have a sufficiently good plan for detecting and reacting to increasing risk."