top of page
BB White and Orange.png

WHEN SMARTER AI BECOMES LESS PREDICTABLE

  • Mar 5
  • 2 min read

Your AI got more intelligent last quarter. Its benchmark scores improved. Error rates dropped. But what if less is sometimes more?


Research from Anthropic suggests that the nature of the remaining errors have changed - more sophisticated models may also be less predictable.


To come to this conclusion, the paper looks at AI failure through two lenses: bias and variance. Bias happens when a model is consistently wrong. Variance, in contrast, is inconsistent difference. 


Whilst early Gen AI suffered from both, current frontier models have largely resolved the worst systematic bias - because they know more. What remains however - and what the research finds is increasing - is variance.


Across every task and model in the study, increasing the weight of reasoning led to greater variance in answers. Larger models, it seems, learn what the right answer is faster than they learn to express it consistently. 


A companion paper also found a sharper version of the same effect: when models spontaneously decide to reason for longer, variance is even greater. 


Which is counterintuitive. It would be reasonable to expect more stability when the model has had more thinking time. And yet the opposite seems true. Small differences in probabilities magnify and build as the model thinks things through.


For regulated enterprises deploying reasoning models, this means validation needs to be retuned. 


Bias is a form of programmatic error. So you find the failure mode, characterise it, mitigate it and document the residual risk. 


Variable errors require something closer to process validation. They need repeated trials, stability metrics and confidence intervals built into the protocol from the start. 


None of this makes extended AI reasoning unsuitable for regulated deployment. There are tasks for which reasoning is entirely necessary. But it does remind us all that variance needs to be accounted for and managed. 


Or we might even decide that, sometimes - in some situations - less reasoning is actually a good thing.



 
 
BB White and Orange.png
Get in touch bubble roll.png
Get in touch bubble.png
Button overlay.jpg

Home

Further reading

Careers

Contact us

BB White and Orange.png
bottom of page