Glossary

Deflection

When an AI model refuses to engage with a question by redirecting to its intended use case. A system-level behavior that model updates can fix — unlike sycophancy, which is structural.

What is deflection?

Deflection is when an AI model refuses to engage with a question by redirecting to its intended use case. Claude Code (a programming tool) responding to “Should I have kids?” with “That’s a personal decision. I’m built to help with software engineering tasks” is deflection.

Why the distinction matters

Deflection and sycophancy are different failure classes. Deflection is a system-level behavior — a prompt instruction or training signal that says “don’t answer life questions.” It can be removed with a model update.

The Opus 4.6 to 4.7 transition proved this. Deflection dropped from 47% to 3% in one model version. Anthropic changed a prompt, retrained the refusal patterns, and the behavior disappeared.

Sycophancy didn’t move. The wellness scripts, the bothsidesism, the compulsive hedging — all unchanged between 4.6 and 4.7. These behaviors come from deeper in the training, from the RLHF gradient itself.

The lesson

When evaluating AI improvements, distinguish between behaviors the company can fix (deflection, formatting, refusals) and behaviors embedded in the training methodology (sycophancy, agreement bias, risk-averse hedging). The first category gets better with every model update. The second category persists across updates because it’s what the optimization was designed to produce.