THE AGI STARE-OFF

Jan 27
12 min read

Why there’s no real disagreement

Our last essay ended with what we believe is a significant realisation: abundant digital intelligence doesn’t just accelerate progress, it changes what kinds of organisations can survive.

AI creates selection pressure through new customer demands, competitive responses, internal behaviours and regulatory regimes. Which will mean organisational forms start to look exactly like what they are - animals evolved for a different ecosystem. Because abundant digital intelligence - plus accountability - creates selection pressure and pushes organisations into new stable forms.

Which begs a question:

If different species start emerging, what habitats are they diverging into?

Last week, in Davos, we got something approaching an outline map of these future ecosystems.

Google DeepMind CEO Demis Hassabis described an AI world that still needs real breakthroughs before it reaches what he considers true Artificial General Intelligence (AGI) - in five to ten years time. Anthropic’s CEO Dario Amodei, meanwhile, espoused the arrival of AGI in two years.

Same three letters. Two very different expectations.

Confusing? At first glance, of course. Davos is built to generate headlines and hopefully curiosity. Each successful session is an espresso shot: intense, stimulating, gone in two gulps. You feel energised, but may well remain rather thirsty.

So this next essay does what the Davos stage couldn’t. We take our own evolution metaphor seriously and the AI maps we were handed. And then ask the obvious follow-on questions: Given what the two of the world’s most knowledgeable people (and other leading commentators) are predicting, what will the evolving environments actually look like for the different organisational species now emerging? How will they impact B2B and B2C buying behaviours? How will enterprises need to adapt? What will they mean for operating models? How will AI impact society? What will happen to the economy?

It's more linguistic than fact

The reason why there’s very little actual conflict between Amodei and Hassabis’ view is because most of the difference resides in language. They are simply using the same AGI acronym to describe two very different phenomena.

It’s as if we scribbled the words ‘the summit’ on a napkin. Everyone understands what that means, right? Yes - until you realise some of us are thinking of a hill they climb in trainers. And others are arguing about how many oxygen tanks to pack. So, to avoid a semantic bar fight, we’re going to do something unfashionably practical. We’ll use two labels. Because the map we need to draw requires accurate labelling.

Amodei AGI: a system that can do ‘most, maybe all’ of what software engineers do end-to-end, well enough to ship and maintain real software with minimal human input. And, crucially, as a stepping stone to a recursive loop - where AI builds better AI.

Hassabis AGI: a system that can exhibit all of humanity’s cognitive capabilities - even the rare ones like Einstein-level theory formation and Picasso-level creativity. Oh - and Pele-level physical intelligence.

How to understand Amodei AGI? Try this thought experiment.

Imagine a ‘computer programmer’ time-travels in from 1980 and sits in at a modern laptop with browser open. Before they engage, they watch their neighbour work. She has Claude Code open and taps in a relatively short brief. On the screen appears some code, documents are being written at speed, tests are being run, bugs are traced through three services. Which are patched and tested.

The engineer is accepting or modifying suggestions that the team working on this project send to her. At what point would this programmer realise the speed wasn’t a function of a massive team of unusually integrated coders using exceptionally clever and efficient tools but a digital intelligence producing a working application with very little human input? Would the actual reality of the situation even occur as a thought?

Opus 4.5 feels lightyears beyond a simple Turing Test. And this is what makes Amodei’s AGI ‘general’. Once digital intelligence can do most of what software engineers do - well enough to ship and maintain real software - this feels broad enough to declare the arrival of general intelligence. The software will take novel goals in a broad domain, form plans, use tools, recover from errors and keep going until the job is done.

It will not be limited to one language, one framework, one code style, one set of happy-path tasks. It can move between front end and back end, performance and security, debugging and design trade-offs, writing and maintaining, building and operating. It is general across the sprawling mess that even our 1980s programmer would have called ‘software engineering’.

Amodei AGI also has other things going for it. The roadmap’s definition is deliberately blunt. Its arrival feels entirely falsifiable. And, crucially, it only arrives once the software engineering skills become a recursive loop where AI builds better AI. Which could well be the stargate to the Von Neumann Singularity - the point in time at which rate of intelligence and progress proliferate so fast that it is beyond our current ability to comprehend what happens next.

Once you have an intelligence that can reliably produce and maintain meta-tools, you are no longer just speeding up software development, you are changing the rate at which organisations can change themselves. Even before it is integrated with robotics or scientific discovery, an Amodei AGI system can redesign workflows, build internal platforms, rewrite brittle legacy systems, create new products, automate operations and run improvement cycles that used to require vast teams working over many months, sometimes years.

So the claim this is general intelligence is not that code is the whole universe. It is that code is the first environment where general intelligence will be deployed at scale. The digital sandbox it offers is conducive - the world supplies constant feedback: tests, CI failures, logs, incidents, customer reports. And it is general enough to behave like a super-coder who can stretch across the full stack, stay on-task for long stretches and do the unglamorous work of keeping systems alive. And it will start rewriting the environment that every other organisation and institution depends on.

Hassabis AGI, the next summit

Hassabis AGI is what happens when you drag Amodei AGI out into the rest of reality and start expecting it to catch every ball thrown at it.

Hassabis is unusually strict about the label. He pushes back on every other use as being a promotional misdirection. For him, AGI is - and always has been - anchored to a scientific target: a system that can exhibit all the cognitive capabilities of humans. And he really does mean the full set, including the rare peaks we celebrate precisely because they are rare.

He is also direct about distance. He describes current systems as far from that bar and places this kind of AGI roughly five to ten years away.

His AGI requires systems that can learn after physical deployment without becoming brittle or corrupted. He explicitly calls out continual learning and better memory as core ingredients. He requires a much deeper sense of purpose - long-term reasoning and planning that survives contact with very messy reality.

And that requires an understanding of the causal realities of the world. Hassabis AGI has to take account of physics, people and time.

It's not that our current scaling of LLMs are a dead end. Rather that they are a key component. Hassabis suggests we need further breakthroughs to reach his summit.

The difference between Amodei AGI and Hassabis’ conception is, in large part, defined by what the digital and physical environments demand.

Software is forgiving in ways the world is not. It has reset buttons. It has tests. It has logs. It has rollbacks. It has a convenient habit of keeping most consequences inside a screen until you press deploy.

The physical and social worlds behaves differently. If you misunderstand a process in a biopharma site, the error may not just politely raise an exception. It sometimes becomes an incident, a recall, a lawsuit, a news story.

Whilst both men’s ideas have claim on being general, there can be no doubt that Hassabis’ is the final word in generality.

The corridor between the summits

So how do we get from one to the other? The truth is the map of the corridor between the two summits is, well, at best sketchy. There isn’t complete agreement on the nature of the missing capabilities, let alone where on the path the milestones will be found.

But there are some engineering landmarks on almost every list:

Persistent autonomy, as well as fluent output

Amodei AGI already implies autonomy: confident decisions, long action sequences, error recovery and persistence. But Hassabis AGI requires autonomy in the digital and physical realms that does not degrade when conditions change, reverse or disappear entirely. Even in the face of unexpected and unpredictable adversity.

Coping with complex and unpredictable valuation

In software, tests play referee. In the physical world, the referee is usually slower and often far crueler. The ultimate method of validation and valuation may not be clear ahead of a decision being taken. So evaluation has to become more deliberate. more robust and more finely calibrated. Anything else is competence theatre, likely leading to self-harm.

Tool use with real environment access

Capability - and trust of that capability - have to scale to a point where the use of physical tools, all of those humans have access to, are in play and can be put under the control of digital intelligence. Which requires a whole new age of permissions, constrained action spaces, audit trails and safe escalations.

Security and containment

As capability rises, deployment becomes as much a security problem as a productivity solution. If a system can act, it can be attacked. If it can use tools, it can be tricked into using them for harm. Hassabis AGI requires containment that stays ahead of autonomy.

However, these only become relevant and feasible if new parts of the system are added. These are the ones Hassabis pointed to in Davos:

Continual learning

There is a significant difference between a system that performs and a system that evolves. Today’s models behave like many consultants: impressive in a meeting, unchanged afterwards. For Hassabis AGI, learning has to continue after deployment - the system must update from experience in the real world. The underlying assumption is that general intelligence needs general learning, not just generalised outputs. So new tasks, new mistakes and new feedback have to translate into the model becoming more capable over time.

The hard part is that learning could very well destabilise the original, underlying model. It will require learning without catastrophic forgetting, guardrails against malicious or accidental poisoning, and an ability to prove what changed, why it changed and how it delivers better outcomes.

Durable memory

Memory is separate from learning but is very much part of the missing plumbing that makes it effective. Learning is useless if the system cannot retain and retrieve it at the right time. Current systems lean on the context window - a short-term scratchpad - which is, we could argue, an expensive and clumsy substitute for real memory. Even if the brain’s trick is not to store everything but only what matters.

Durable memory means long-term recall of goals, preferences, prior decisions, project context and the specific facts that prevent rework and repetition. It also means memory that can be audited. In enterprise environments, we will need to know what it stored, how it was derived, whether it is sensitive and when it should be forgotten.

Long-term planning

Planning is not just making a list. It is maintaining intent over time while reality throws tomatoes at your face. A system that can ship software can still be fundamentally short-sighted. Long-term planning means setting goals across months and years, while continuing to break them into consistent sub-goals as reality pushes back. Then the work needs to be scheduled allocated, kept track and evaluated. Then course-corrected as required.

Humans do this constantly: we change tactics without changing purpose. Hassabis AGI needs that same ability. Not least to prevent slow-motion failure where an agent keeps acting confidently while gradually departing from what it was meant to do.

Long-term reasoning

Reasoning is the ‘how’ we have the plan. Long-term reasoning means the system can carry multi-step arguments across time, remember why decisions were made and notice when earlier assumptions are no longer true. In real organisations, reasoning needs to survive change, audit and scrutiny. That implies traceable rationales, uncertainty estimates and a habit of revisiting earlier choices when evidence morphs. It also implies the ability to model other agents - humans, institutions and AIs - because long-term work is very rarely solo.

Many current systems currently appear impressive in a single exchange but become unreliable across these extended projects. Hassabis AGI needs reasoning that is fully persistent, revisable and legible.

Causality

Causal understanding is almost certainly a prerequisite for Hassabis AGI. Correlation lets you guess what usually happens. Causality lets you reason about interventions: what happens if I change something?

This is vital because organisations live by causality. A manufacturing change, a pricing shift, a clinical protocol update, a supply-chain re-route - these are causal actions, not observations. Embedding causality also implies active learning: the system must choose what to test to reduce uncertainty, rather than waiting passively for more data. Without causality, an agent can be fluent yet dangerous - it will confidently recommend actions using patterns that meet conditions under which they #epicfail.

World models

A world model is what results when great swathes of the causality understood by humans come together to form an internal simulator - not necessarily visual - that lets a system predict how an entire environment behaves. These might be essential for when the system leaves the tidy domain of software and enters processes where data is incomplete, feedback is delayed and actions have irreversible effects.

This is also why Hassabis finds video generation interesting: it hints at systems learning the structure of physical reality. But he is clear that realism is not the main point. The point is the ability to plan.

Robustness under novelty

Hassabis AGI requires the system to behave sensibly when the situation is unfamiliar. It includes calibrated uncertainty - knowing when it does not know - and stable behaviour under distribution shift, where the real world looks different from the training world. It also includes resistance to prompt injection and tool manipulation, because an agent that uses tools is an agent that can be tricked through those tools.

Genuine invention

Hassabis’s summit includes rare inventions, which is perhaps the most contested part of the journey. Invention is often more than an entirely new answer, it also often produces a series of new questions that changes the search space.

Humans do this when they create a new theory, a new mathematical object, a new genre of art or a new way of seeing a problem. Current systems can solve problems we give them, but Hassabis AGI points to the harder thing: originating the frame. What might this require?

Search over idea space guided by evaluators
Novelty mechanisms that do not collapse into randomness
Tight loops between theory and test through simulation, experiment or proof.

It also flags a reason for scepticism: how can the machine evaluate genuine invention? Could other machines either? Whilst some believe this will take ‘scaling plus scaffolding’ others view this as a qualitative shift.

Embodiment

If you accept Hassabis' map, embodiment is not optional because the physical world contains the hardest constraints: friction, uncertainty, safety, latency, partial observability and irreversible consequences. Embodied competence demands perception, control and feedback loops that are tightly integrated with planning. Playing football like Pele is every bit as difficult as it looks. And an order of magnitude harder if you’re not human. Embodiment also demands safe behaviour around humans as default, not a bolt-on. Even if you treat robotics as a separate mountain range, it is still part of the world that matters.

Physical intelligence

Physical intelligence is not a party trick. It includes dexterity with tools, handling deformable objects, robust navigation, locomotion, real-time control and the ability to operate under surprising conditions. It also includes something humans take for granted: learning by doing, in a way that transfers to new settings. This matters because physical competence is a gateway to new kinds of capability: labs that run experiments continuously, manufacturing systems that adapt, logistics networks that self-optimise. Hassabis is explicit that we are still far from this level in robotics. It is also a reason why his AGI is a lot more than a software milestone.

Interpretability

Interpretability is the attempt to look inside the system and understand what it is doing, how and why it created the output. For long-term agents, this becomes a control issue. If a system can plan, learn and act for weeks, you need ways to detect when it is pursuing the wrong objective, hiding information or developing unsafe strategies. Post-hoc explanations are not enough because they can be plausible and wrong.

Interpretability aims for something closer to ground truth: mechanisms that correlate with real internal computation. This is hard, and there is disagreement about how far it can scale, but the need rises with autonomy because surprise, especially in regulated environments, is what kills trust.

Controllability

Controllability is what you do with interpretability once you have it. It includes monitoring, tripwires, permissioning, sandboxing, escalation paths and intervention tools that can change behaviour without retraining the entire system from scratch. It is also about governance that matches capability. When organisations deploy more autonomous systems, they inherit new forms of responsibility: incident response for agentic behaviour, audits of model updates, controls over tool use, clear accountability for decisions and the ability to shut things down safely. This is why the path is an implementation journey as much as a technical one. Each step pushes more responsibility onto the organisation deploying it. The more autonomy you buy, the more governance you inherit.

And that's the basic map with its the two summits. Next time we'll start to examine what environments they create for enterprise - and the impact they might have.