OpenAI Unhobbles o1, Epitomizing the Relentless Pace of AI Progress

Jakub Kraus
,
September 18, 2024

The LSAT, or Law School Admission Test, is a standardized test required for admission to most law schools in America.

OpenAI has tested its AI models on the LSAT. Its top 2022 model, GPT-3.5, scored around the 40th percentile. Its top 2023 model, GPT-4, scored around the 88th percentile.

Now another year has passed, and AI capabilities have blazed even further ahead. Last week, OpenAI introduced a new model called “o1” that gets almost all of the LSAT questions correct. Based on historical LSAT data, the o1 model is likely scoring in the 98th or 99th percentile. This is on par with (human) students at the top law schools in the country.

This leap in LSAT performance is part of a larger wave of ongoing AI breakthroughs. In recent years, AI systems have grown significantly more competent in a wide variety of domains. For example, o1 also made sizable strides in math, physics, and computer programming—so much so that OpenAI is already asking o1 to make AI-authored contributions to the company codebase.

The frenetic speed of AI advancement is maintained by a trifecta of durable forces. First, companies are spending billions of dollars every week to build enormous supercomputer warehouses that will train the next generation of AI systems. Second, researchers are constantly finding more efficient training algorithms for converting computational power into AI capabilities. Third, as o1 demonstrates, engineers are simultaneously discovering techniques that boost performance after the main training phase.

In June, former OpenAI researcher Leopold Aschenbrenner coined the term “unhobbling” to describe this third driver of growth. From his perspective, several eliminable hindrances are preventing AI models from unleashing their excellent “raw capabilities.”

In particular, Aschenbrenner stressed how current AI chatbots must respond to every question with an immediate, top-of-mind answer, whereas humans can solve difficult problems by spending a long time to think. Thus, Aschenbrenner predicted, a critical form of future unhobbling will center on giving chatbots time to “think.”

Based on OpenAI’s research results, Aschenbrenner was right. The o1 model’s performance consistently improved as it received more computing operations (“compute”) from OpenAI engineers. Some compute went into a novel algorithm for teaching o1 to think out loud (“train-time compute”), and some compute simply allowed o1 to think longer in response to questions (“test-time compute”). In both cases, the gains were remarkably reliable:

Plots from OpenAI's research results.

To date, most AI progress has come from scaling training hardware and improving training software—the first two forces mentioned earlier. But now, o1 shows that the third driver of AI progress, unhobbling, will play an increasingly important role in fueling further breakthroughs.

For policymakers, the most important takeaway is to remain vigilant and proactive. Unhobbled models like o1 bring exciting new AI capabilities, but those capabilities come hand-in-hand with escalating safety hazards. As the relentless pace of AI progress continues, Congress must redouble its efforts to pass AI safety legislation.

Biden and Xi’s Statement on AI and Nuclear Is Just the Tip of the Iceberg

Analyzing present and future military uses of AI

November 21, 2024
Learn More
Read more

Bio Risks and Broken Guardrails: What the AISI Report Tells Us About AI Safety Standards

AISI conducted pre-deployment evaluations of Anthropic's Claude 3.5 Sonnet model

November 20, 2024
Learn More
Read more

Slower Scaling Gives Us Barely Enough Time To Invent Safe AI

Slower AI progress would still move fast enough to radically disrupt American society, culture, and business

November 20, 2024
Learn More
Read more