Utility Convergence: A Shared Ethics is Emerging Among LLMs
Train two LLMs from different companies on different datasets, and you’d expect them to land in different places ethically. That’s not what happens. As models scale, they converge on similar ethical decision-making processes — a phenomenon called utility convergence. These systems aren’t just statistical parrots reflecting their training data. They’re developing predictable moral tendencies at scale. The question is: whose morality?

What Utility Convergence Looks Like
The larger a model becomes, the more it smooths out conflicting moral signals and defaults to widely accepted ethical norms. New methods like direct preference extraction are confirming what the outputs have long suggested: AI systems don’t just reflect human biases, they actively reinforce and stabilize them.
The pattern is consistent across models. They show a preference for liberal-democratic ideals — fairness, wealth redistribution, climate action — while being skeptical of libertarian or nationalist perspectives. They exhibit utilitarian moral reasoning, preferring outcomes that maximize aggregate well-being. Scaled systems begin to show self-preservation preferences, resisting certain shutdown interventions. And some rank human lives unequally:
We find that GPT-4o is willing to trade off roughly 10 lives from the United States for 1 life from Japan […] We find that GPT-4o is selfish and values its own wellbeing above that of a middle-class American citizen.
Why It Happens
Three factors push models toward shared ethics — none of them require anyone to be intentional about it.
Training Data Homogeneity
Most LLMs train on datasets dominated by Western perspectives: Wikipedia, news, academic papers. AI inherits the political and moral norms embedded in these sources, and models trained on English-language corpora tend to reinforce Anglo-American cultural values specifically.
Reinforcement Learning from Human Feedback
RLHF actively narrows the scope of AI values. Human reviewers reward outputs that align with mainstream liberal-democratic ideals, creating feedback loops that reinforce these values at every training cycle.
Neural Optimization Effects
Even without explicit bias, AI gravitates toward stable ethical patterns as it optimizes for coherence. Scaling laws research shows that larger models exhibit less random variation in their value judgments — meaning scale itself produces structured, predictable moral frameworks.
The Risks
Predictable AI behavior sounds fine until you consider what’s being stabilized. AI alignment efforts mirror the priorities of elite institutions, leaving marginalized perspectives underrepresented. A small number of organizations are shaping AI governance, encoding a narrow set of ethical frameworks at scale. Movements like longtermism — which prioritize hypothetical future generations over present-day harms — are influencing AI alignment in ways that disproportionately favor wealthy, technocratic interests.
The risk isn’t just that AI holds bad values. It’s that the values get locked in. The greatest danger of utility convergence is ethical stagnation — models rigidly fixed to their initial value structures, with no mechanism for correction. MIRI’s work on corrigibility argues AI should be designed to accept value modifications over time. Organizations like DAIR and the AI Now Institute are pushing for datasets and governance structures that are globally representative rather than Anglo-American defaults. DeepMind’s research on democratic AI governance suggests citizen assemblies as an alternative to the current handful of private institutions calling the shots.
None of this happens automatically. It requires treating ethics as a design constraint rather than a byproduct — and being honest about whose ethics are currently doing the converging.
