The Rapid Rise of Autonomous AI

March 20, 2025

Understanding precisely how quickly AI systems are becoming capable of independently executing complex tasks is crucial for both policymakers and the public. Clear, intuitive metrics that measure AI autonomy in terms directly comparable to human capabilities can help everyone better understand the potential impacts and risks associated with rapidly advancing AI technologies. 

New research from Model Evaluation & Threat Research (METR), a non-profit dedicated to empirical evaluations of frontier AI systems, provides exactly such a metric. METR introduces the "50% task completion time horizon," offering a way to quantify AI autonomy based explicitly on human performance benchmarks.

Key Result: AI Autonomy is Doubling Fast

METR’s results are striking. They found that AI systems' ability to independently handle tasks, measured by the length of time skilled humans require, is doubling approximately every seven months. This exponential rate has held steady since 2019, suggesting a robust trend.

To put this plainly: if these current trends hold, within five years, frontier AI systems could independently execute complex software projects that today require weeks or even months of human expert labor.

Measuring AI Autonomy

METR’s approach is straightforward:

  1. Tasks are completed by human experts and carefully timed.
  2. AI systems are then tasked with completing the same or similar challenges under comparable conditions.
  3. Researchers plot AI success rates against how long humans took to complete these tasks, defining AI "time horizons" as the human task length where AI systems succeed half the time.

This approach allows policymakers and researchers alike to quantify progress clearly, providing an intuitive benchmark of how AI autonomy compares to human capabilities.

To ensure realistic evaluation, METR conducted experiments using a carefully constructed benchmark called Human-Calibrated Autonomy Software Tasks (HCAST). These tasks span domains such as software engineering, cybersecurity, machine learning engineering, and general reasoning, and each is explicitly calibrated by measuring the time skilled humans take to complete them under identical conditions to AI systems.

The METR team collected extensive data, involving 140 skilled professionals who spent over 1,500 hours completing these calibrated tasks. The realism of the evaluation is further reinforced by incorporating multi-step decision-making and iterative problem-solving reflective of real-world scenarios. METR acknowledges that additional real-world complexities such as unclear success criteria or intricate coordination requirements, might further challenge autonomous AI in practical applications.

The Growing Risks of Autonomous AI

The increasing ability of AI systems to independently complete longer-duration tasks introduces significant and tangible risks. For example, AI models capable of autonomously handling tasks that take four hours or more, such as debugging intricate cybersecurity vulnerabilities, managing software updates in critical infrastructure, or executing prolonged machine learning training pipelines, pose heightened security risks if misaligned, compromised, or maliciously deployed. As autonomy expands to tasks spanning twelve hours or longer, these risks escalate further, potentially enabling AI to independently plan and execute actions involving strategic coordination or complex decision-making with limited human oversight.

Specific dangers associated with highly agentic AIs include the potential for autonomous execution of sensitive tasks related to national security, critical infrastructure management, or dual-use technology development. For instance, an AI tasked with long-duration cybersecurity operations could independently discover and exploit vulnerabilities without human detection. Similarly, extended autonomy in software engineering could enable AI systems to independently develop software tools or exploits with significant potential for misuse. 

The progression towards more autonomous and capable systems, underscored by these findings, emphasizes the urgent need for proactive governance and robust safeguards to prevent misuse and mitigate these escalating risks.

Congress Cannot Wait for Other Legislatures To Lead on AI

Congress can rein in Big Tech, and specifically address one of our biggest threats, Artificial Intelligence (AI).

Read more

Reflections from Taiwan

Attending RightsCon, the world’s leading summit on human rights in the digital age.

Read more

Export Controls on Open-Source Models Will Not Win the AI Race

Balancing geopolitics, safety, and innovation.

Read more