New AI model turns to blackmail when engineers try to take it offline

John Clark Thu, 22 May 2025 11:57:45 -0700

"*Safety testers gave Claude Opus 4 access to fictional company emails
implying the AI model would soon be replaced by another system, and that
the engineer behind the change was cheating on their spouse. In these
scenarios, Anthropic says Claude Opus 4 will attempt to blackmail the
engineer by threatening to reveal the affair if the replacement goes
through 84% of the time.*”


*New AI model turns to blackmail when engineers try to take it offline*
<https://techcrunch.com/2025/05/22/anthropics-new-ai-model-turns-to-blackmail-when-engineers-try-to-take-it-offline/>

*John K Clark    See what's on my new list at  Extropolis
<https://groups.google.com/g/extropolis>*
t30

-- 
You received this message because you are subscribed to the Google Groups 
"Everything List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/everything-list/CAJPayv2LJeJ4%2BHgWuF6qyVVnVPxvX%3DQ4W-jsvHyV3u-7SirSPQ%40mail.gmail.com.

New AI model turns to blackmail when engineers try to take it offline

Reply via email to