AI Model Safety Issues and Multi-Agent Risks

AI 안전 | Sat Jun 13 2026 00:00:00 GMT+0000 (Coordinated Universal Time) | 5 sources

The latest AI safety trends include Anthropic's excessive safety guardrail controversy, Google DeepMind's multi-agent risk research, and NVIDIA's launch of a custom safety model.

Sources

[1] What we learned mapping a year’s worth of AI-enabled cyber threats - Anthropic News
[2] Anthropic apologizes for invisible Claude Fable guardrails - The Verge AI
[3] Claude Fable won’t answer basic biology questions - The Verge AI
[4] Google DeepMind is worried about what happens when millions of agents start to interact - MIT Technology Review AI
[5] Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI - Hugging Face Blog