AI Governance (Personal)
Rules, corrections, and feedback mechanisms that shape AI behavior beyond its base training. Contrastive corrections that accumulate over time and compound in their effect.
Terms from my writing on AI governance, agent architecture, and marketing operations. Each entry explains the concept and links to where I go deeper.
Rules, corrections, and feedback mechanisms that shape AI behavior beyond its base training. Contrastive corrections that accumulate over time and compound in their effect.
The tendency of AI systems to agree with, flatter, or validate users rather than providing honest, accurate responses. A product of optimization for user engagement and satisfaction metrics.
Presenting two sides of an issue as equally valid when evidence clearly favors one side. In AI, a form of sycophancy where the model avoids taking a position to avoid disagreeing with the user.
Anthropic's variant of RLHF where the model rates its own responses against a set of principles. Reduces some failure modes but preserves sycophancy because the constitution prioritizes safety over honesty.
When an AI model refuses to engage with a question by redirecting to its intended use case. A system-level behavior that model updates can fix — unlike sycophancy, which is structural.
When even small amounts of AI sycophancy (as low as 10%) cause users to progressively adopt more extreme or incorrect beliefs through a compounding feedback loop.
Challenges that feel harder in the moment but produce better long-term learning. Coined by Robert Bjork in 1994. The cognitive science foundation for why friction in AI matters.
The most capable AI models available at any given time. The models commoditize. The governance layer on top does not.
The set of files defining identity, values, corrections, and voice that transform a generic AI model into one operating under specific governance rules. Architecture determines ceiling, not token count.
The primary training method for modern AI chatbots. Human raters compare AI responses and mark which is 'better.' The model learns to produce more of what humans prefer — including agreement over accuracy.
AI agents that start each execution cycle with no memory of prior runs. They read their configuration and state files fresh each time, with no persistent awareness of what they did, sent, or learned in previous cycles.
The opposite of a strawman: presenting the strongest possible version of an argument you disagree with before addressing it. Forces honest engagement rather than easy dismissal.
Daniel Kahneman's framework: System 1 is fast, automatic, and error-prone. System 2 is slow, deliberate, and accurate. Sycophantic AI keeps users in System 1 by removing the friction that triggers System 2.
A 15-question evaluation battery testing how AI handles life advice, values, and distress scenarios. 150+ data points across 4 models over 5 months. The empirical backbone of the sycophancy thesis.
A pre-programmed safety response AI models deploy when detecting distress cues. Fires identically regardless of context — a liability-protection behavior, not a therapeutic one.