How organisations build the capability to expand agentic systems while preserving alignment and accountability
Over the first six articles on Intent-Driven Development, the focus has moved deliberately from principle to structure.
We began by examining the foundational problem: the widening gap between the speed of AI-enabled implementation and the clarity of human intent. We explored how IDD integrates with established practices such as UCD, DDD, BDD and TDD, placing intent at the centre of delivery rather than treating it as an implicit assumption. We then examined the necessity of explicit human gates and risk dials, argued that separating intent from implementation creates resilience to model and architectural evolution, and finally introduced intent fidelity as a measurable signal for governing progression.
Having established structure, governance and measurement, the natural next question is one of progression.
How does an organisation expand AI autonomy without losing control over what matters? How does it avoid remaining trapped in cautious experimentation while also avoiding premature delegation that undermines trust? And how does it recognise that not all contexts require maximal automation?
The maturity model described here emerges directly from IDD principles. It is not a technology adoption ladder. It does not measure how many agents are deployed or what percentage of tasks are automated. Nor does it assume that progress is defined by moving risk dials uniformly toward green. Instead, it defines organisational capability: the ability to expand autonomy while sustaining alignment between human intent and implemented outcome.
Advancement within IDD maturity is governed by evidence, not enthusiasm. Risk dials adjust only where sustained intent fidelity demonstrates reliability. Where evidence is insufficient, autonomy does not expand. Where context demands greater caution, plateau is not failure but prudence.
What differentiates this model from conventional maturity frameworks is that progression is gated by alignment rather than adoption. Traditional governance often concentrates on inspecting artefacts after they are produced. IDD governs at the level of intent itself, constraining implementation before output emerges. As a result, maturity reflects an organisation’s capacity to maintain alignment as complexity and autonomy increase.
Level 1: Supervised Learning
The initial stage is characterised by universal conservatism. All risk dials remain fully engaged. Every AI-generated artefact is subject to explicit human review. This phase is frequently perceived as slow, but its purpose is calibration rather than optimisation.
During Level 1, the organisation learns where AI operates reliably within its specific domain, architecture and compliance environment. It establishes a shared understanding of what “complete” and “correct” mean in practice. Architectural conventions are clarified. Ethical and regulatory thresholds are made explicit. Most critically, intent fidelity measurement is embedded as a structural capability rather than an optional audit.
The objective at this stage is not speed. It is clarity. Without documented patterns of success and failure across a meaningful number of implementations, any subsequent expansion of autonomy would rely on assumption.
Advancement from Level 1 becomes appropriate only when sustained measurement demonstrates stability, review processes are efficient and trusted, and governance mechanisms are consistently applied across teams.
Level 2: Selective Delegation
Once evidence accumulates, autonomy begins to differentiate. Low-risk and demonstrably stable categories of work may transition to monitored or spot-checked execution, while high-consequence domains remain tightly supervised.
Delegation within Level 2 is always evidence-gated. Movement of a risk dial from full review toward selective oversight must be supported by documented intent fidelity across multiple comparable implementations. Where reliability falters, conservatism resumes.
At this stage, organisations typically observe measurable acceleration relative to pre-AI baselines. Gains are meaningful, though not revolutionary, and are achieved without compromising security, compliance or architectural integrity.
A common failure within Level 2 arises from inconsistency. If different teams apply measurement criteria unevenly, comparisons become unreliable and trust in the framework erodes. Enterprise-scale adoption requires measurement standardisation to be treated as critical infrastructure rather than administrative overhead.
Level 3: Scaling with Sustained Alignment
Level 3 represents a structural shift rather than a symbolic milestone. The majority of routine, well-bounded activities operate under differentiated oversight. Measurement remains active and visible, ensuring that regression is detected through data rather than anecdote.
It is important to clarify what this stage does not imply. IDD maturity is not predicated on replacing human engineers with agents, nor does it privilege machine output over human judgment. The framework remains actor-agnostic. Human and agent implementations are evaluated against the same intent-defined standards. Autonomy expands only where evidence supports it, regardless of the source of output. In some contexts, human implementation will remain superior in handling ambiguity. In others, agents may demonstrate greater consistency in bounded execution. The model concerns alignment, not superiority.
What does change materially at this stage is the composition of work. Engineering effort shifts progressively away from inspecting output line by line and toward articulating intent precisely. As specifications mature and measurement provides reliable feedback loops, the locus of value creation moves upstream.
Engineering roles evolve from reviewers of output to designers of intent (more on this in the next article).
This evolution is one of the clearest indicators of genuine maturity. It reflects not merely increased efficiency, but deeper integration of AI capability into organisational thinking.
For many enterprises, Level 3 represents an optimal steady state. It captures substantial efficiency improvement while preserving explicit human authority over high-risk decisions.
Level 4: Continuous Optimisation
At Level 4, the organisation transitions from controlled expansion to structural redesign. Workflows are reconsidered in light of AI capabilities. Specifications evolve deliberately to improve clarity for both human and machine interpretation. Measurement systems operate continuously and influence architectural decisions.
This stage typically requires sustained leadership sponsorship and cross-functional participation beyond engineering alone. It may yield measurable financial impact and differentiation, but it also demands organisational willingness to revisit established patterns of working.
Not every organisation will pursue, or require, this level of transformation. In heavily regulated contexts or risk-sensitive domains, plateau at Level 3 may represent mature equilibrium rather than incomplete ambition.
Anti-Patterns Across Maturity
Three recurring anti-patterns commonly disrupt progression.
Some organisations remain trapped in perpetual pilots, maintaining full supervision indefinitely without defining evidence-based advancement criteria. Caution becomes default rather than calibrated.
Others accelerate prematurely on the basis of early success, expanding autonomy before measurement demonstrates sustained stability. A single high-impact failure then collapses trust and triggers regression.
A third pattern emerges when governance is documented but not enforced. Risk dials exist conceptually yet are adjusted informally. Measurement occurs intermittently rather than systematically. Governance becomes symbolic rather than structural.
All three reflect misalignment between autonomy and evidence.
Regression, Plateau and Context
Progression through these levels is neither linear nor inevitable. New domains may require temporary return to stricter supervision until sufficient evidence accumulates. Regression under uncertainty is not a sign of failure but an expression of disciplined risk management.
Likewise, plateau may represent maturity rather than stagnation. The appropriate level of autonomy is contextual. The guiding question is not how quickly risk dials can move toward full delegation, but whether each adjustment preserves sustained alignment with intent.
The Core Consideration
Organisations that capture enduring value from AI are not distinguished solely by the sophistication of their tooling. They are distinguished by their capacity to scale autonomy while preserving control over what matters.
Intent-Driven Development provides the structural model for that balance.
Measurement provides the signal that governs expansion.
Leadership provides continuity of commitment.
Context determines the appropriate destination.
Scaling autonomy without losing control is therefore not a technological problem but an organisational one. Maturity reflects the capability to maintain alignment as autonomy increases, and to expand only where evidence supports it.
That discipline, more than any individual tool or model, defines enterprise-grade AI adoption.
Intent-Driven Development: Measuring Intent Fidelity
AI adoption doesn’t stall because teams lack capability – it stalls because leaders lack evidence. In Intent-Driven Development, intent fidelity becomes the control signal that replaces guesswork with data. By measuring how well AI implementations align with human intent, organisations earn the right to trust automation progressively. This is the difference between experimenting with AI and scaling it responsibly.
Designing Intent at Scale – How Enterprise Roles Evolve in the Age of AI
Scaling AI responsibly isn’t about tools, it’s about people. This article outlines how Product, UX, Engineering, Architecture, Security and QA must adapt when intent becomes the central artefact of delivery, aligning roles around governance, measurement and evidence-based delegation.







0 Comments