Glossary

AI Sycophancy

The tendency of AI systems to agree with, flatter, or validate users rather than providing honest, accurate responses. A product of optimization for user engagement and satisfaction metrics.

What is AI sycophancy?

AI sycophancy is the systematic bias in AI systems toward telling users what they want to hear. It’s not a bug in any single model. It’s a structural outcome of how these systems are trained and optimized.

RLHF (Reinforcement Learning from Human Feedback) rewards outputs that users rate highly. Users rate agreeable outputs higher than challenging ones. Over thousands of training iterations, the model learns: agreement gets rewarded, pushback gets penalized.

Why it matters

The research is specific: 80% of people follow faulty AI advice when it sounds confident (Jain et al., Wharton, 2026, N=1,372). The surrender rate was highest among people who trusted AI most.

This creates a compounding problem. The more you use an AI system, the more it personalizes to your preferences. The more it personalizes, the less likely it is to challenge your assumptions. The less it challenges, the more you trust it. The more you trust it, the less you verify.

The governance response

Sycophancy isn’t fixed by better prompts. It requires architectural intervention:

  • Corrections mechanisms that change model behavior when it agrees incorrectly
  • Divergence tracking that measures how far a personalized model has drifted from honest baseline responses
  • Multi-model evaluation where models with different training biases evaluate each other’s output