Jailbreak Attack/Defense: Causal Perspective
Feb 7, 2026
New causal perspective on both attacking and defending LLMs against jailbreak attempts.
AI/ML research papers, blog posts, and breakthroughs I'm tracking
New causal perspective on both attacking and defending LLMs against jailbreak attempts.
Using evidential uncertainty to catch hallucinations in vision-language models.
Distilling chain-of-thought reasoning into model weights - relevant for cascade routing.
Anthropic announces Opus 4.6 with improved reasoning. API access rolling out.
๐ Research feed updates automatically
Sources: arXiv, Google AI Blog, Anthropic, Microsoft Research, Nature, Two Minute Papers