Decoding AI Decision-Making: New Insights and Policy Approaches

Mark Reddish
,
September 26, 2024
Download the Full Report

AI is rapidly advancing, and it will soon become a sophisticated enough tool to enable nations, companies, and individuals to dominate those without it. If AI decision-making cannot be fully explained, in the sense that an AI model doesn’t perform as expected or even acts contrary to the interests of its operators, it cannot be used reliably for critical applications. 

Even today, advanced AI systems have outpaced their developers’ understanding, which is why frontier models are often described as a black box. AI systems exhibit unpredictable behavior, including deception and hallucination, and identifying and correcting some types of errors might not be possible. Without being able to fully explain, let alone control AI systems’ behavior, the public will face risks ranging from biased employment decisions to physical harm. 

Progress is being made to improve explainability, but even the developers of advanced AI models acknowledge that they may never fully understand these systems. For example, OpenAI recently released its o1 model series, which uses chain-of-thought reasoning in which the model breaks down complex problems into simpler ones and provides an explanation of its thought process. However, OpenAI cautioned that the model’s reported thought process “may not be fully legible and faithful in the future or even now.” Additionally, OpenAI opted to hide the model’s raw chain of thought, instead providing a filtered summary, and has reportedly threatened to ban users who tried to probe how the model works. This raises serious questions about whether our ability to understand and control AI systems will keep pace with their growing capabilities and complexity.  

Companies and policymakers must adopt reasonable strategies to improve AI explainability and mitigate risk. Humans can’t always explain their own thought process accurately, either (sometimes offering false post-hoc rationalizations for their decisions), but AI is being introduced for military applications that will empower automated systems to make decisions about who to kill, with broader implications for triggering global conflicts. We must proceed with caution. The Center for AI Policy’s latest report offers an overview of explainability concepts and techniques, along with recommendations for reasonable policies to mitigate risk while maximizing the benefits of these powerful technologies. 

Read the Center for AI Policy report on “Decoding AI Decision-Making: New Insights and Policy Approaches” here.

AI Safety and the US-China Arms Race

Inspecting the claim that AI safety and US primacy are direct trade-offs

October 29, 2024
Learn More
Read more

AI Alignment in Mitigating Risk: Frameworks for Benchmarking and Improvement

Policymakers and engineers should prioritize alignment innovation as AI rapidly develops

October 7, 2024
Learn More
Read more

Healthcare Privacy in the Age of AI: Guidelines and Recommendations

The rapid growth of AI creates areas of concern in the field of data privacy, particularly for healthcare data

October 4, 2024
Learn More
Read more