Anthropic dares you to try to jailbreak Claude AI

Commercial AI chatbot products like ChatGPT, Claude, Gemini, DeepSeek, and others have safety precautions built in to prevent abuse. Because of the safeguards, the chatbots won't help with criminal activity or malicious requests — but that won't stop users from attempting jailbreaks.

Some chatbots have stronger protections than others. As we saw recently, DeepSeek might have stunned the tech world last week, but DeepSeek is not as safe as other AI when it comes to offering help for malicious activities. Also, DeepSeek can be jailbroken with certain commands to circumvent the built-in censorship. The Chinese company will probably improve these protections and prevent known jailbreaks in future releases.

Meanwhile, Anthropic already has extensive experience dealing with jailbreak attempts on Claude. The AI firm has devised a brand-new defense against universal AI jailbreaks called Constitutional Classifiers that prevents Claude from providing help with nefarious activities. It works even when dealing with unusual prompts that might jailbreak some other AI models.

The system is so good that over 180 security researchers spent more than 3,000 hours over two months trying to jailbreak Claude. They were not able to devise a universal jailbreak. You can test your luck if you think you have what it takes to force Claude to answer 10 questions with your jailbreak.

Continue reading...

The post Anthropic dares you to try to jailbreak Claude AI appeared first on BGR.

Today's Top Deals