Intent-Driven Development: Measuring Intent Fidelity

by Richard Stockley | Feb 8, 2026 | AI Ml, Governance And Strategy, Technology | 0 comments

Pop art illustration showing a professional woman reviewing intent fidelity metrics while AI systems operate in the background, with risk dials moving from stop to caution to trust, representing measured and governed Intent-Driven Development success.

How to know when it’s safe to trust AI more?

You’ve invested in Intent-Driven Development (IDD).

You’re writing specifications that separate human intent from AI implementation. You’ve introduced explicit risk dials at key decision points. Your framework survives both model evolution and architectural change.

Now comes the question every senior leader eventually asks:

“How do we know it’s actually working?”

This question marks the dividing line between organisations stuck in pilot purgatory and the small minority that scale AI successfully. The organisations that succeed don’t guess when to increase automation. They measure. They adjust based on evidence. They move from caution to confidence through proof, not promises.

In Intent-Driven Development (IDD), intent fidelity is the primary control metric. It tells you when AI systems are behaving as intended, when they are not, and when it is safe to trust them more.

This article defines how intent fidelity is measured, and how those measurements give leaders objective confidence to move risk dials from 🔴 to 🟡 to 🟢.

Why the 94% Stay Stuck

In traditional software development, teams could “try it and see.” The cost of failure was bounded, a few developers, a few weeks, a limited blast radius.

Agentic AI changes that equation entirely.

A single agent can generate thousands of lines of code in minutes, deploy infrastructure, modify databases, and integrate across systems. The potential impact of a mistake is no longer measured in developer-hours.

Organisations freeze between two competing fears:

Move too slowly → Competitors who adopt AI faster, gain advantage
Move too fast → A major failure damages trust and derails adoption

The small percentage of organisations that move beyond pilots resolve this tension through measurement.

They increase automation deliberately, guided by evidence of intent fidelity, the degree to which AI implementation aligns with clearly articulated human intent.

When intent fidelity is high and stable, automation increases safely. When intent fidelity degrades, control tightens immediately.

Measurement makes trust explicit rather than assumed.

The Four Dimensions of Intent Fidelity

Intent fidelity is not a single score. AI systems fail in different ways, so fidelity must be measured across four distinct dimensions. Without all four, blind spots remain.

Common failure patterns include:

Building the right thing incorrectly (high completeness, low correctness)
Building the wrong thing correctly (low alignment, high correctness)
Building something that works once but degrades over time (low consistency)

Measuring all four dimensions gives a complete, actionable picture.

1. Completeness

Did the AI implementation address everything specified in the intent?

Completeness checks whether all success criteria, constraints, ethical considerations, and validation scenarios defined in the IDD specification were implemented.

Example (shopping cart specification):

Cart persistence across sessions? ✅
Accessible from any device? ✅
Performance under 200ms? ✅
GDPR deletion capability? ✅
No dark patterns? ✅

Completeness: 100%

Red flags: missing features, ignored constraints, skipped ethical considerations.

2. Correctness

Does the implementation actually work as specified?

Correctness measures whether the implementation:

Passes validation tests
Handles edge cases
Fails safely
Meets performance and security requirements

Example:

Cart persistence works ✅
Loads in 850ms ❌ (spec required <200ms)
Breaks when cart exceeds 100 items ❌ (should fail gracefully)

Correctness: 67%

Red flags: edge-case failures, performance regressions, security vulnerabilities.

3. Alignment

Does the implementation reflect the real intent behind the specification?

Alignment is the most critical, and least automatable, dimension. It requires human judgment about whether the solution solves the right problem, not just the stated one.

Example:

Intent: “Enable users to save items for future purchase.”

Implementation A: Saves cart for 24 hours, then deletes ❌
Implementation B: Saves cart indefinitely until user deletes ✅

Only the second implementation reflects the underlying user intent.

Alignment failures are not tooling failures, they are intent interpretation failures.

Red flags: stakeholder feedback of “that’s not what we meant,” domain model violations, technically correct but conceptually wrong solutions.

4. Consistency

Does the implementation fit coherently within the existing system?

Consistency measures adherence to architectural patterns, domain conventions, and system design principles.

Example:

System standard: event-driven state changes
AI implementation: direct database writes ❌
No domain events emitted ❌

Red flags: architectural violations, divergent patterns, silent introduction of technical debt.

Calculating Intent Fidelity

Intent fidelity combines all four dimensions into a single, trackable metric.

Intent Fidelity =
(Correctness × 0.35) + (Completeness × 0.25) + (Alignment × 0.25) + (Consistency × 0.15)

Why these weights?

Correctness (35%): broken systems fail regardless of intent
Completeness (25%): missing intent creates silent gaps
Alignment (25%): solving the wrong problem wastes all effort
Consistency (15%): refactorable, but still costly

The specific weights matter less than consistency and coverage. Organisations may tune them, but all four dimensions must remain present.

Example:

Correctness: 85%
Completeness: 100%
Alignment: 90%
Consistency: 95%

Intent Fidelity = 91.5%

Measuring Human Intent Fidelity (The Baseline Principle)

In Intent-Driven Development, intent fidelity is measured at the implementation boundary, not at the actor. Whether an implementation is produced by:

a human engineer
an AI agent
or a human–AI collaboration

…it is evaluated against the same four dimensions: completeness, correctness, alignment, and consistency.

This is a foundational rule of IDD: Humans are not exempt from measurement, and AI is not held to a harsher standard. Intent is the contract = and all implementations are judged against it.

Why the Human Baseline Matters

Without measuring human implementations:

AI has no credible reference point
failures are misattributed to tooling rather than unclear intent
organisations mistake anecdote for governance

IDD requires a human baseline to establish what “good” looks like in practice.

This baseline is not the best engineer on their best day. It is a representative view of how intent has historically been implemented across the organisation.

Interpreting the Results

Human and AI implementations use the same scoring model – but interpretation differs.

When human intent fidelity is low, root causes typically include:

incomplete or ambiguous intent specifications
undocumented domain assumptions
coordination and time-pressure effects

When AI intent fidelity is low, root causes typically include:

specification gaps
domain modelling weaknesses
inappropriate autonomy for the task

A low score does not indict the actor, it diagnoses the system.

When both human and AI scores are low, the issue is neither people nor machines.
It is intent quality.

The Executive Insight

Intent-Driven Development does not ask leaders to trust AI more than humans.

It asks them to trust measurement more than intuition.

Holding humans and AI to the same intent fidelity standard makes governance fair, defensible, and scalable.

How to Measure: The Comparative Build Method

The most reliable way to measure intent fidelity is comparison against a human baseline.

Process:

Parallel implementation: Same IDD specification implemented by both human and AI
Compare outcomes: Identical validation, performance tests, and stakeholder review
Analyse divergence: Differences are categorised as:AI limitation (maintain 🔴)
1. Equally valid alternative (acceptable)
2. Improvement (AI outperforms baseline)
3. Specification ambiguity (fix intent, not AI)

This works because AI is not measured against perfection, it’s measured against what “good” already looks like in the organisation.

When to Move Risk Dials

Intent fidelity scores determine when automation can safely increase.

Single-Agent Systems

Stay at 🔴 (Full human review):

Intent fidelity <85%
Any alignment failures
First 10–20 implementations of a new task type
Security, compliance, or ethical concerns

Move to 🟡 (Spot-check):

85–95% fidelity sustained across 20+ implementations
No recent alignment failures
Well-bounded task category

Move to 🟢 (Monitoring):

95% fidelity sustained across 50+ implementations
Automated validation reliably detects issues

Multi-Agent Systems

Different agents earn trust at different rates.

Test agents: move to 🟢 fastest
Security agents: often remain 🔴 permanently
Backend / frontend agents: earn 🟡 after 20–30 successes
Architect / coordinator agents: remain 🔴 longest

Architectural mistakes carry the highest long-term cost.

Demonstrating ROI to Leadership

Measurement enables translation from technical metrics to business outcomes.

The timelines below illustrate how measurement compresses uncertainty over time. They describe confidence progression, not fixed delivery schedules.

Velocity Improvement

Before IDD:
2 developers × 3 weeks = 6 developer-weeks

After IDD (🟡 stage):

Specification: 2 days
AI implementation + spot-check: 2 days

Result: ~6× velocity improvement with maintained quality
(Illustrative, directionally consistent with early adopters.)

Risk Reduction

Without measurement:

15% AI-generated changes cause production issues
Cost per issue: $50k
Annual cost: $750k

With intent fidelity measurement:

2% issue rate
Annual cost: $100k
≈ $650k annual risk reduction

Confidence to Scale

Time to production confidence:

Without measurement: 6–12 months stuck in pilots
With intent fidelity: predictable progression to scale within 9 months

Measurement Cadence

Measure deliberately, not continuously.

Weeks 1–4: measure every implementation (baseline)
Months 2–3: measure every third implementation
Months 4–6: selective measurement by category
Month 7+: quarterly sampling and incident-driven reviews

How This Fits the Full IDD Journey

Article 1: AI builds fast, but is it building the right thing?
Article 2: IDD integrates UCD, DDD, BDD, TDD around intent
Article 3: Risk dials provide explicit human control
Article 4: IDD survives model evolution
Article 5: IDD scales across architectures and agents
Article 6: Intent fidelity measurement provides evidence to scale safely

This is how AI adoption is de-risked:

Separate the stable (human intent) from the fluid (AI implementation)
Govern with explicit risk dials
Survive inevitable evolution
Measure intent fidelity
Scale where evidence supports it

The organisations that succeed do all five.

Your Next Steps

If you are implementing IDD:

Month 1: establish baseline intent fidelity
Months 2–3: identify patterns, improve specifications
Months 4–6: move to 🟡 selectively where evidence supports it
Month 7+: scale with confidence, backed by data

You don’t need to trust AI blindly.
You don’t need to stay cautious indefinitely.

You measure intent fidelity.
You adjust risk dials based on evidence.
You scale where data supports it.

This is how confidence replacescaution.

#IntentDrivenDevelopment #IDD #IntentFidelity #AIGovernance #EvidenceBasedAI #TechLeadership

Check out the other articles in this series …

<- Previous Article

Pop art banner showing a woman adjusting red, amber and green risk dials on an Intent-Driven Development Interface, directing a cute multi-agent robot team with a coordinator, illustrating intent in and software out with interchangeable agents.

Intent-Driven Development via Multi-Agent Systems

Multi-agent systems are emerging as the next evolution in AI-powered development, but they don’t change how we should specify human intent. By separating intent from AI architecture, Intent-Driven Development ensures specifications remain stable, tool-agnostic, and future-proof, no matter how agents, models, or orchestration patterns evolve.

Next Article ->

Intent-Driven Development: Maturity Model

In today’s AI-accelerated world, the challenge isn’t whether technology can build software faster, it’s whether organisations can ensure that what gets built actually reflects human intent. Traditional maturity models tend to measure adoption by counting tools or automated outputs, but this risks conflating activity with alignment. True capability emerges not from the number of agents deployed, but from an organisation’s capacity to expand autonomy while preserving clarity, accountability and control.

0 Comments

Interviews

Are you looking for some interviews with leading industry experts? Then check out these 👇

Anti-Money Laundering – Future of Finance

Jul 22, 2024 | AI Ml, Crypto, Finance, Interview

This is the second article in our Future of Finance series, in which the amazing Dr Janet Bastiman talks about how “intelligence driven” anti-money laundering and compliance technology can rise to the challenges of different payment devices, microtransactions, and digital currencies. There are also some juicy AI/ML topics to sink your teeth into!

« Older Entries

Intent-Driven Development: Measuring Intent Fidelity

How to know when it’s safe to trust AI more?

Why the 94% Stay Stuck

The Four Dimensions of Intent Fidelity

1. Completeness

2. Correctness

3. Alignment

4. Consistency

Calculating Intent Fidelity

Measuring Human Intent Fidelity (The Baseline Principle)

Why the Human Baseline Matters

Interpreting the Results

The Executive Insight

How to Measure: The Comparative Build Method

When to Move Risk Dials

Single-Agent Systems

Multi-Agent Systems

Demonstrating ROI to Leadership

Velocity Improvement

Risk Reduction

Confidence to Scale

Measurement Cadence

How This Fits the Full IDD Journey

Your Next Steps

Check out the other articles in this series …

0 Comments

Leave a ReplyCancel reply

Interviews

Discover more from Richard Stockley