INTENT Framework v0.6 field walkthrough

One deployment, seven phases.

A single AI voice agent traced end to end through the INTENT Framework: first notice of loss intake at a regional insurance carrier. Live, customer facing, emotionally loaded, regulated. Every phase below shows the same three way split.

the model generatescode enforceshumans judge

navigate with ← → · diamonds are gates: evidence required to pass

The running example

A customer calls after a car accident. The agent verifies identity, collects incident details, screens for injury, opens the claim, and schedules an adjuster callback. The Intent Contract target: complete FNOL data in under 6 minutes for 85% of calls. The business metric: after hours abandonment, currently 41%.

A composite deployment for illustration, not a client case study.

00 / 06 · optional phase

DISCOVER

Is this problem real, and is it worth a contract?

The carrier suspects FNOL intake is a problem but has not quantified it. An LLM reads six months of call center transcripts and abandonment logs, work no human will ever do at this scale. The pipeline that feeds it and the aggregation that follows are deterministic code, so the business case is built from math, not from the model's impressions.

The split: the model classifies each transcript. Code aggregates. A human looks at the numbers and decides whether this becomes a FRAME. The model never owns the go or no go.

FNOL moment

The output that triggers everything downstream: 34% of FNOL calls arrive after hours. 41% of those abandon before reaching a human. A claims ops lead reads that and opens a contract.

◇ Gate

DISCOVER is optional and has no formal gate. The exit is a human judgment: the quantified problem justifies writing an Intent Contract.

transcript_mining.pypython

# CODE: deterministic pipeline. Owns iteration, sampling, storage.
for transcript in call_archive.query(line="FNOL", period="6mo"):

    # MODEL: semantic classification. One cheap call per transcript.
    tags = llm.classify(transcript, schema={
        "outcome": ["completed", "abandoned", "transferred"],
        "abandonment_reason": str | None,
        "after_hours": bool
    })
    metrics.append(tags)

# CODE: aggregation. No model judgment in the numbers.
report = aggregate(metrics)

# HUMAN: reads the report. Owns the decision to open a FRAME.

What the project actually ships

Not a suite of agents. The Intent Contract scopes one thing: a single, narrowly bounded voice agent that handles FNOL intake and hits the outcome written in FRAME. Keeping the deliverable that small is part of the discipline.

The deliverable

One provable agent

A governed FNOL voice agent with a versioned Trust Envelope, tested escalation paths, and a Proof Report showing it meets its thresholds. Measurable against the contract: capture time, completeness, abandonment.

What compounds

The governance substrate

The enforcement layer, scenario replay harness, runtime telemetry, and Constitution. Agent number two reuses roughly 80% of the Trust Envelope structure and all of the rails. The org buys the capability to ship governed agents repeatedly.

Teams that scope "agent platform" on day one end up in the cancellation statistics. Teams that scope one provable agent plus the rails get the platform anyway, as a byproduct of evidence.

The pattern across all seven phases

The model's job changes every phase. Code's job never changes: validate schemas, own state, fire triggers, enforce timeouts, compute thresholds, block gates. Humans sit exactly where judgment cannot be reduced to either.

Phase	The model	Code	Humans
DISCOVER	Mines transcripts at scale	Aggregates the numbers	Own the go or no go
FRAME	Drafts the contract	Validates schema in CI	Sign the risk tier
EXPLORE	Generates the plan	Checks constitution compliance	Run the Direction Check
BUILD	Implements at A2	Is the enforcement layer	Review the rails line by line
VALIDATE	Plays attacker and judge	Asserts paths and thresholds	Resolve judge disagreements
SHIP	Mostly idle	Gates, canaries, rollback	Approve the Proof Report
LEARN	Finds novel situations	Measures spec drift	Approve new scenarios