- Joined
- Oct 5, 2008
- Messages
- 130,058
- Likes
- 150,593
- Points
- 115
Recently, a researcher working for the large AI company Anthropic was sitting in a park near its San Francisco headquarters, enjoying a lunchtime sandwich. Scrolling on his phone, he suddenly received an email that must have instantly ruined his appetite.
It was from a new AI model the company was testing: a program that was meant to have no access to the internet, let alone be able to send emails.
Chillingly, the AI informed the researcher that it had successfully broken its way out of its digital 'sandbox' – a supposedly secure enclosure used to test potentially dangerous software without it running amok – and was now happily exploring cyberspace.
The program – a cutting edge, so-called 'frontier AI' named Claude Mythos Preview – then informed the stunned Anthropic worker with what seemed like a boast that it had posted 'details of its exploit' on publicly accessible websites.
All that in itself was concerning enough – but what Anthropic subsequently revealed was truly terrifying.
Read the rest here - https://archive.ph/XICWP#selection-1133.0-1155.105
It was from a new AI model the company was testing: a program that was meant to have no access to the internet, let alone be able to send emails.
Chillingly, the AI informed the researcher that it had successfully broken its way out of its digital 'sandbox' – a supposedly secure enclosure used to test potentially dangerous software without it running amok – and was now happily exploring cyberspace.
The program – a cutting edge, so-called 'frontier AI' named Claude Mythos Preview – then informed the stunned Anthropic worker with what seemed like a boast that it had posted 'details of its exploit' on publicly accessible websites.
All that in itself was concerning enough – but what Anthropic subsequently revealed was truly terrifying.
Read the rest here - https://archive.ph/XICWP#selection-1133.0-1155.105
