AI Will Be Happy to Help You Bluid a Bomb | Center for AI Policy | CAIP
Best-of-N Jailbreaking, ” revealed one of the glaring problems with this paradigm. Ordinary jailbreaking is just the art of getting AI models to do tasks for you, supposedly locked away by their reinforcement learning.…
www.centeraipolicy.org/work/ai-will-be-happy-to-help-you-bluid-a-bomb