Decoding AI Decision-Making: New Insights and Policy Approaches

September 26, 2024

AI is rapidly advancing, and it will soon become a sophisticated enough tool to enable nations, companies, and individuals to dominate those without it. If AI decision-making cannot be fully explained, in the sense that an AI model doesn’t perform as expected or even acts contrary to the interests of its operators, it cannot be used reliably for critical applications.

Even today, advanced AI systems have outpaced their developers’ understanding, which is why frontier models are often described as a black box. AI systems exhibit unpredictable behavior, including deception and hallucination, and identifying and correcting some types of errors might not be possible. Without being able to fully explain, let alone control AI systems’ behavior, the public will face risks ranging from biased employment decisions to physical harm.

Progress is being made to improve explainability, but even the developers of advanced AI models acknowledge that they may never fully understand these systems. For example, OpenAI recently released its o1 model series, which uses chain-of-thought reasoning in which the model breaks down complex problems into simpler ones and provides an explanation of its thought process. However, OpenAI cautioned that the model’s reported thought process “may not be fully legible and faithful in the future or even now.” Additionally, OpenAI opted to hide the model’s raw chain of thought, instead providing a filtered summary, and has reportedly threatened to ban users who tried to probe how the model works. This raises serious questions about whether our ability to understand and control AI systems will keep pace with their growing capabilities and complexity.

Companies and policymakers must adopt reasonable strategies to improve AI explainability and mitigate risk. Humans can’t always explain their own thought process accurately, either (sometimes offering false post-hoc rationalizations for their decisions), but AI is being introduced for military applications that will empower automated systems to make decisions about who to kill, with broader implications for triggering global conflicts. We must proceed with caution. The Center for AI Policy’s latest report offers an overview of explainability concepts and techniques, along with recommendations for reasonable policies to mitigate risk while maximizing the benefits of these powerful technologies.

Read the Center for AI Policy report on “Decoding AI Decision-Making: New Insights and Policy Approaches” here.

Whistleblower Protections for AI Employees

Whistleblowers are a powerful tool to minimize the risk of public harm from AI. Our latest research shows how proper protections can be designed to avoid concerns such as the violation of trade secrets.

AI Agents: Governing Autonomy in the Digital Age

A report on policies to address the emerging risks of increasingly autonomous AI agents.

AI at the Cyber Frontier: Securing America's Digital Future

Report on the cybersecurity implications of evolving AI capabilities, including actionable policy guidance for Congress.