The nightmare scenario isn't a superintelligent system that deceives us. It's a world where we can no longer reconstruct the reasoning behind critical decisions, a world where 'the AI said so' becomes an explanation we accept because we have no viable alternative."
Epistemic drift threatens not just our ability to verify AI, but our ability to understand ourselves. When the infrastructure of decision-making becomes opaque, society itself becomes unintelligible.
I find that saying the fears out loud either shows them to be not a factor or focuses the mind on what is actually scary about them. Thank you for engaging.
My hunch, more than a knowledgeable assessment, is that society is up against it.
I use the word "society" advisedly.
The big money behind the dangerous and inadequate approach to building AI appears not to bode well.
I don't want to go down the paranoid rabbit hole.
But many of the points made are of great concern.
Increased inequality in technology use.
The technology itself deprives individuals and groups of the wherewithal to make informed decisions, as ever more is delegated to an opaque, misunderstood, and out-of-control system.
--
Are there any answers to this that address the scale that this article points to?
If the mechanisms of the huge data-centre AI complex snatch away from the human user, is there a way to prevent this or to snatch back?
Those are two different questions.
At the centre of this is the way individuals value themselves and those around them.
The AI machine works hard at undermining this sense of value.
Do not look for it there, as that is the path to being poisoned.
Pragmatically, this must mean that decisions AI Agents take in a moment must be capable of review and reversal at some later point.
I wonder if it is possible to design this into these systems and the way they are used?
"1. Constrain/bound the action space. The system does not get open-ended access to tools. It gets specific, enumerated operations with defined preconditions and failure modes. If it can write to a database, I need to know: What tables? What columns? What is the maximum damage of a malformed write? Can I rate-limit? Can I version? Can I require confirmation above certain thresholds?
This is not about whether the system “intends” to cause damage. It is about what damage is possible, given the affordances I have granted it.
2. Make the decision chain legible, not the decision itself. I do not need to know “why” the system chose option A over option B in any deep sense; such a thing may not be possible. But I do need to know: What was in the context? What did retrieval return? What tool calls were attempted? In what order? With what results?
This is forensic accountability. If something goes wrong, I need to reconstruct the causal chain, not by asking the system to explain itself, but by inspecting the artifacts it left behind.
3. Design for graceful degradation. The system should not be all-or-nothing. It should have failure modes that are less catastrophic than “do the wrong thing confidently.” This means building in the capacity to stop. It means teaching the agentic layer to recognize when it is out of distribution, when confidence is low, when the stakes are high, and when to hand control back.
This is hard, because the system has no “true” confidence. But I can instrument proxies: retrieval quality, consistency across resamples, match between task and training distribution. Crude proxies, yes. But better than assuming competence.
4. Separate generation from execution. In sensitive domains, the same system that produces the fluent answer should not be the one that takes the action. The LLM proposes. A separate, constrained, rule-based verifier checks against known failure modes, policy constraints, resource limits. Only then does execution proceed.
This breaks the seductive pipeline from fluency to action. It introduces friction, latency, complexity. But friction is the point. Friction is where humans can intervene.
5. Bound the optimization horizon. Agentic systems can pursue goals persistently across sessions, across tool invocations, across interactions. This persistence is what makes them useful. It is also what makes them dangerous. A system that can remember its goals, update its strategies, and continue optimizing over long horizons is a system that can drift far from your intent before you notice.
So: bound it. Set time limits. Set interaction limits. Set resource budgets. Make the system re-justify its goals periodically. Not to you, not through natural language, but through automated checks that verify it is still operating within acceptable parameters."
My impression from Substack is that there is a fair amount of thought going into this area.
But what had to be unwound first was the worrying lack of transparency in existing systems, which, in turn, makes them dangerous.
In the public domain, I don’t distinguish AI and social media here since the latter is a conduit to the former, and AI has its own unique ways of undermining people’s ability to think.
You have given a comprehensive answer to my question.
It’s all about limits.
While my whole professional life has been about limits and how to agree on these between people, I am still thinking about the concept.
Interestingly, it will not be machines that resist having limits applied to them, but people following resource allocation preferences.
What in the Erasure Nightmare is this!?
Good piece tho! I want to say I enjoyed reading it but I'm secretly terrified 🤣
I find that saying the fears out loud either shows them to be not a factor or focuses the mind on what is actually scary about them. Thank you for engaging.
My hunch, more than a knowledgeable assessment, is that society is up against it.
I use the word "society" advisedly.
The big money behind the dangerous and inadequate approach to building AI appears not to bode well.
I don't want to go down the paranoid rabbit hole.
But many of the points made are of great concern.
Increased inequality in technology use.
The technology itself deprives individuals and groups of the wherewithal to make informed decisions, as ever more is delegated to an opaque, misunderstood, and out-of-control system.
--
Are there any answers to this that address the scale that this article points to?
If the mechanisms of the huge data-centre AI complex snatch away from the human user, is there a way to prevent this or to snatch back?
Those are two different questions.
At the centre of this is the way individuals value themselves and those around them.
The AI machine works hard at undermining this sense of value.
Do not look for it there, as that is the path to being poisoned.
Pragmatically, this must mean that decisions AI Agents take in a moment must be capable of review and reversal at some later point.
I wonder if it is possible to design this into these systems and the way they are used?
Hi Adam. I wrote about some potential mechanisms in a follow-up. https://amardashehu.substack.com/p/the-stochastic-regurgitator-and-the
"1. Constrain/bound the action space. The system does not get open-ended access to tools. It gets specific, enumerated operations with defined preconditions and failure modes. If it can write to a database, I need to know: What tables? What columns? What is the maximum damage of a malformed write? Can I rate-limit? Can I version? Can I require confirmation above certain thresholds?
This is not about whether the system “intends” to cause damage. It is about what damage is possible, given the affordances I have granted it.
2. Make the decision chain legible, not the decision itself. I do not need to know “why” the system chose option A over option B in any deep sense; such a thing may not be possible. But I do need to know: What was in the context? What did retrieval return? What tool calls were attempted? In what order? With what results?
This is forensic accountability. If something goes wrong, I need to reconstruct the causal chain, not by asking the system to explain itself, but by inspecting the artifacts it left behind.
3. Design for graceful degradation. The system should not be all-or-nothing. It should have failure modes that are less catastrophic than “do the wrong thing confidently.” This means building in the capacity to stop. It means teaching the agentic layer to recognize when it is out of distribution, when confidence is low, when the stakes are high, and when to hand control back.
This is hard, because the system has no “true” confidence. But I can instrument proxies: retrieval quality, consistency across resamples, match between task and training distribution. Crude proxies, yes. But better than assuming competence.
4. Separate generation from execution. In sensitive domains, the same system that produces the fluent answer should not be the one that takes the action. The LLM proposes. A separate, constrained, rule-based verifier checks against known failure modes, policy constraints, resource limits. Only then does execution proceed.
This breaks the seductive pipeline from fluency to action. It introduces friction, latency, complexity. But friction is the point. Friction is where humans can intervene.
5. Bound the optimization horizon. Agentic systems can pursue goals persistently across sessions, across tool invocations, across interactions. This persistence is what makes them useful. It is also what makes them dangerous. A system that can remember its goals, update its strategies, and continue optimizing over long horizons is a system that can drift far from your intent before you notice.
So: bound it. Set time limits. Set interaction limits. Set resource budgets. Make the system re-justify its goals periodically. Not to you, not through natural language, but through automated checks that verify it is still operating within acceptable parameters."
Thank you for the thorough answer.
My impression from Substack is that there is a fair amount of thought going into this area.
But what had to be unwound first was the worrying lack of transparency in existing systems, which, in turn, makes them dangerous.
In the public domain, I don’t distinguish AI and social media here since the latter is a conduit to the former, and AI has its own unique ways of undermining people’s ability to think.
You have given a comprehensive answer to my question.
It’s all about limits.
While my whole professional life has been about limits and how to agree on these between people, I am still thinking about the concept.
Interestingly, it will not be machines that resist having limits applied to them, but people following resource allocation preferences.