Meta Conducts Limited Safety Testing of Llama 3.1

Jason Green-Lowe
,
July 26, 2024

Last Tuesday, Meta released Llama 3.1, which it describes as the “first frontier-level open source AI model.” It was trained with 3.8 × 10^25 FLOP – enough to require pre-registration and basic benchmark testing under the Center for AI Policy (CAIP)’s model legislation, but not quite at the 10^26 FLOP frontier level that would trigger the need for a full licensing application.

There are several troubling issues with the Llama 3.1 release. The draft standard from the Open Source Initiative calls on open-source models to provide “sufficiently detailed information about the data used to train the system, so that a skilled person can recreate a substantially equivalent system using the same or similar data.” However, Meta notes only that their data comes from “a variety of data sources,” which falls far short of the standard.

Professor Kevin Bankston, the Senior Advisor on AI Governance for the Center for Democracy & Technology, also notes that the model comes with “extensive use restrictions,” which are inconsistent with open source methodology, and appears to lack “evals or mitigations for bias around race, gender, or other protected categories.”

Another concern about Llama 3.1 is its energy usage: Meta admits that fluctuations in GPU activity can result in “can result in instant fluctuations of power consumption across the data center on the order of tens of megawatts, stretching the limits of the power grid.” As the power demands of AI training runs increase, it appears that we will need measures to limit the resulting pollution and ensure that consumers are not left without electricity.

CAIP’s special focus is on catastrophic risk – we are concerned that powerful AI models could be used to help terrorists develop weapons of mass destruction, or that they could destabilize essential infrastructure, or that they could result in rogue AI agents spreading unchecked across the Internet.

To its credit, Meta voluntarily tested Llama 3.1 to find out whether its new model would exacerbate these risks. As far as we can tell from their technical report, though, these tests were only run on the safest version of Llama 3.1 – the one that had been carefully fine-tuned to avoid any dangerous tendencies. As previous researchers have shown, such fine-tuning can be easily and cheaply removed; safety guardrails on the Llama 2 model were destroyed at a cost of only $200. We see no evidence that Meta ran any tests or red-teaming exercises to find out whether its model could assist with developing biological weapons after Llama 3.1’s guardrails were removed.

This is like hitting a home run off of a T-ball stand and then bragging about your baseball skills. The whole point of publishing your model weights online for everyone to download is to encourage the public to experiment with your tools and see what novel capabilities they can unlock. If you only test your model’s safety on its most boring, locked-down version, then there’s no reason to think it’s actually safe.

Meta has essentially run a closed-source safety test on an open-source AI system. This is not enough to protect the American public. We urge Meta to improve its safety protocols, and we urge Congress to set minimum safety standards like the ones found in CAIP’s model legislation so that these kinds of errors will not continue to be overlooked.

The House That AI Built

Countless American sports fans are suffering emotional despair and financial ruin as a result of AI-powered online sports betting

October 31, 2024
Learn More
Read more

"AI'll Be Right Back"

5 horror movie tropes that explain trends in AI

October 28, 2024
Learn More
Read more

A Recommendation for the First Meeting of AI Safety Institutes

Reclaim safety as the focus of international conversations on AI

October 24, 2024
Learn More
Read more