This website uses cookies

Read our Privacy policy and Terms of use for more information.

A useful agentic change does not end when the diff appears, it ends when the system is coherent again. AI has made intent easier to express and ideas cheaper to explore, but the system still has to settle into a state we can trust.

I believe you’ve all been in a familiar situation: You ask the agent to change signup form validation. It updates the form, the server-side validator, even the e2e test. Green across the board. You ship.

A couple of minutes later, the first bug report hits: Password reset is not working.

What no one noticed is that the two features share the same validation assumptions but implement them separately.

Then the usual questions come up:

  • Who touched the validator?

  • Who owned password reset?

  • Why did review miss it?

  • Why did the test suite stay green?

Eventually, the blame lands on the agent. You create a hotfix, check signup & password reset, release, and hope the fix didn't break something else.

Change got cheap. Confidence is more expensive than ever.

What do we actually want to know?

Agents have a weird tendency to feed us the false feeling of confidence: they hand you a clean diff, a long list of checkmarks, and a good explanation, yet you still can't trust it. These artifacts describe the change from the agent’s perspective, but they do not clearly reflect what happened to the system.

What we actually want to know is usually more nuanced.

There’s the obvious, explicit part:

  • the agent changed what I asked it to change

  • it works

And then there is the broader, implicit part:

  • what became outdated was removed

  • conflicts were resolved, no contradictions remained

  • standards were followed, boundaries were respected

  • nothing else changed, the rest of the system remained operational

That second list is usually where confidence disappears.

Did the system move from one valid state to another, or did the agent satisfy the prompt while violating something outside the frame?

I spent months staring at green pipelines and still shipping bugs. Eventually I realized the problem wasn't the agent. It was how I thought about the system. Or more like how I didn't.

What is the system made of?

A software system is not just code. It is a set of requirements expressed through physical artifacts.

A high-level business requirement like “users can sign up with email and password” is not implemented in one place. It is satisfied through several parts of the physical system: the signup form, client-side and server-side validation, API contracts, authentication, session handling, redirects, error states, the user entity, database schema and so on.

Those parts are shared by other requirements too.

Change the validation logic, the redirect handling, or the user schema shape, and you risk breaking sign up, or any other requirement that shares the same implementation, whether you named it or not.

Here's the same system before the agent touched it:

Requirements live on the theoretical plane.

The physical plane contains the things we can actually control or observe: code, tests, schemas, rules, logs, traces, UI behavior, diffs and so on.

That separation matters because we cannot inspect a requirement directly in the running system.

We inspect the places where that requirement touches the physical system: implementation code, tests, rules, schemas, logs, traces, and runtime behavior.

Some requirements describe a capability of the system — functional requirements:

  • A user can sign up with an email and a password.

Other requirements describe a quality or constraint of the system — non-functional requirements:

  • UI components cannot import directly from persistence code.

  • Export completes under 30 seconds.

For the system to be coherent, all requirements have to be satisfiable together. Contradictions in the requirement set eventually surface as instability in the running system.

That is what happened with the password reset bug.

What an agentic change does

A change is not merely code moving. It is a shift in the requirement set.

  • New requirements are added.

  • Obsolete requirements are removed.

  • Contradictions between requirements are resolved.

A good change moves the system from one coherent requirement set to another.

When we ask an agent to make a change, we describe a desired shift on the theoretical plane. Through natural language instructions, we aim to express the new truth of the system.

The agent does not edit requirements. It edits code, hoping the code now matches what you asked for.

How the change went:

  • First, it updated the signup form

  • An e2e test checking the signup flow broke

  • Updated the e2e test to check the new password validation requirements

  • e2e test still breaks with a server-side validation error

  • Updated password validation on the BE to meet the new requirement

  • All tests pass. Production-ready.

The server-side and client-side validators shared an implicit connection that affected both signup and password reset. The agent didn't notice. Neither did we.

During code review, we try to project the physical diff back onto the requirement set. The agent changed implementation artifacts. Which requirements are connected to those artifacts? That is harder than it sounds, because requirements do not map to files one-to-one.

A perfect review would map every requirement to every changed file. In practice, you see a fraction of that:

  • requirements named in the task

  • requirements noticed during review

  • requirements covered by automatic checks

That gives us what I think of as the Blind Spot Map: a way to examine requirements by their visibility and level of protection.

Contractualized

Non-contractualized

Explicit

named in the task and automatically checked

named in the task, but manual review has to verify it

Implicit

not named in the task, but covered by an automatic check

maybe affected by change, but neither named nor checked

The top row is the visible part of the review. If the requirement was named in the task, we at least know to look for it. If it is also contractualized, an automatic check can signal when it breaks. (FE Signup password validation)

The bottom-left quadrant is where automation gives us the unexpected edge. The requirement was not named, but an existing check can still protect us from hurting it. (BE Signup password validation)

The bottom-right quadrant is the blind spot. The changed implementation participates in requirements nobody named, and no automated check protects. (Password Reset) Left unchecked, this blind spot grows into a black hole, slowly eating away at our product.

I know this may sound like a lot. Theoretical plane. Physical plane. Explicit requirements. Implicit requirements. Blind spots. But hang with me, because this is where it becomes practical.

The blind spot shrinks only when we build mechanisms that protect the requirements we value. That is where convergence mechanisms come in: bridging the theoretical and the physical, closing the feedback loop.

Convergence Mechanisms: What actually protects requirements

A requirement is only protected when we have a mechanism to evaluate whether it still holds.

convergence mechanism interprets physical evidence against requirements so we can judge whether the system still satisfies them.

Different requirements need different evidence. The table below provides a rough map of tools we can use for evaluating them. These are the physical building blocks of our convergence mechanisms.

Requirement

Static evaluation

Runtime evaluation

Functional

types, schemas, API contracts

unit tests, integration tests, e2e tests, walkthroughs

Non-functional

boundaries, dependency graphs, architecture rules, lint rules, code shape

monitoring, alerts, error rates, traces, logs, production behavior

I think of it as four parts: the requirement, the evidence, the interpreter, and the feedback.

The requirement might come from a spec, an API contract, or a user story. The evidence might come from a diff, a trace, or runtime behavior. The interpreter might be a linter, a test runner, or a reviewer reading through the change. The feedback might be a failure, a warning, or simply a green checkmark.

Once we separate those parts, the strength of a mechanism becomes easier to see.
A mechanism is stronger when:

  • the requirement representation is explicit

  • the observed surface is close to reality

  • the interpreter is reliable

  • the feedback is unambiguous

Unwritten rules are fragile because the requirement is only represented in our memory.

Written guidance is somewhat stronger because the requirement is externalized. But it can still be skipped, outdated, or contradicted without anyone knowing.

Executable contracts are the most reliable because the requirement is explicit, the interpretation is automated, and the feedback is deterministic.

They also make contradictions between requirements harder to hide: two tests making incompatible promises, conflicting lint rules, a schema mismatch, or a type error.

The stronger the mechanism, the less confidence depends on someone remembering the requirement at the right moment.

The Convergence Prism - How to Harden Your Convergence Mechanisms

The Convergence Prism puts requirements at the center, surrounded by the mechanisms that keep the physical system aligned.
The upper half is the local agentic harness.
he lower half is where human judgment does the heavier lifting.

Hardening strengthens the mechanisms around the requirements.

Hardening moves in two directions: upward, toward more executable feedback, and inward, closing the distance between the mechanism and the requirement it protects.

A mechanism that runs after the change is done can judge the work. One that runs as part of the agentic loop can steer it.

Static rules, type checks, and unit tests run inside the agent loop. E2E tests run in CI, before merge. Monitoring gives runtime feedback, but it sits outside the local harness.

Adding more guidance can feel like hardening, but guidance without feedback is another surface where entropy accumulates. Docs and skills can drift from the code, contradict other instructions, or preserve decisions that are no longer current.

The more guidance you add without connecting it to the feedback loop, the more the agent has to guess instead of verify.

The harness needs to do three things for the agent:

  • find the source of truth

  • run the relevant feedback loop

  • know what still requires human judgment

Executable files should carry the truth. Agent guidance should route, not duplicate. It explains current constraints; it does not preserve history. Agent instructions are not a diary. Human docs handle rationale, onboarding, and everything that cannot be expressed in code.

Here is what that looks like in practice.

repo/
  package.json                         # commands the agent can run
  tsconfig.json                        # TypeScript constraints
  eslint.boundaries.config.js          # architecture / dependency rules
  eslint.code-style.config.js          # style and local code rules
  prettier.config.js                   # formatting rules
  playwright.config.ts                 # e2e configuration and test discovery
  vitest.config.ts                     # unit test configuration and test discovery

  README.md                            # human-facing onboarding
  docs/
    adr/                               # human-facing rationale and decisions

  AGENTS.md                            # thin router / index
  .agents/
    architecture.md                    # architectural intent + links to checks
    code_style.md                      # style intent + links to checks
    testing.md                         # when to run which feedback loop + links to checks
    skills/                            # optional, extract specific skills here if main guidance artifacts start to bloat
      review/
        SKILL.md

  src/
    feature_x/                         # feature slices, organized into horizontal layers according to your boundary rules
      ports/
      adapters/
      domain/
      application/
      presentation/
        components/
        hooks/
        view_models/
      utils/
      AGENTS.md                        # optional, only for current feature-specific constraints that cannot be codified
    shared/                            # general-purpose utilities, following the same layering rules

There's no right or wrong source code organization inside feature slices. You can use MVC, clean or hexagonal architecture as long as you're consistent. Agents are pattern-matching machines. The best thing you can do for them is to be predictable.

AGENTS.md

# Agent instructions

Use this file as an index.

The source of truth lives in code, tests, schemas, rules and configs.

## Where to look

- Architecture and boundaries: `.agents/architecture.md`
- Code style: `.agents/code_style.md`
- Testing guidance: `.agents/testing.md`
- Repeatable workflows: `.agents/skills/`
- Features: read `.agents/architecture.md` to understand where features are located.

## Before changing code

1. Identify the feature slice affected by the task.
2. Identify the requirements likely affected by the change.
3. Prefer the smallest coherent change.
4. Run the relevant feedback loop based on testing, code style and architecture guidance:
   - Run type check.
   - Run lint.
   - Run tests for the affected slice.
   - Run boundary checks when imports or feature structure changed.
   - Run broader tests only when the change crosses slice boundaries.
   - If a check fails, treat it as feedback about a requirement. Do not ignore or loosen the check without human approval. 

## Rules
- Do not guess unclear requirements. Ask for clarification when the requirement changes behavior, boundaries, or contracts.
- If a requirement is enforced by tooling, follow the tool and run the check.
- If a requirement comes from documentation, check the source code to verify if it still holds before taking it for granted. 
- If contradictions arise, ask for human clarification.

.agents/architecture.md

# Architecture

This project is organized around vertical feature slices with horizontal layers inside the slices.

## Source of truth

- Horizontal layers and import boundaries: `eslint.boundaries.config.js`
- Architecture check command: `pnpm lint:boundaries`
- Shared types and schemas: source files under `src/shared`
- Feature-local behavior: e2e tests inside each feature slice

## Current constraints that are not fully encoded

- Shared code should be introduced only when two or more slices need the same stable abstraction.
- If a change crosses feature boundaries, explain why in the final response.
- If the current structure conflicts with the requested change, ask before reshaping the architecture.

Towards Synergy

Agentic engineering will keep making change cheaper, but that does not make convergence automatic. It makes convergence our responsibility.

The ideal loop combines human judgment, agentic speed, and deterministic mechanisms that push back when something important stops holding.

Instead of expecting a fleet of agents to take care of quality, build the habit of tightening the harness one step at a time.
Write down an unwritten rule.
Move one stale instruction into a check.
Turn one repeated review comment into a test.
Point the agent to the source of truth instead of restating it.

One step will not protect the whole system, but it reduces the blind spot by one requirement. Over time, those steps add up.
The challenge is making the harness grow faster than the uncertainty your agents create.

Reply

Avatar

or to participate

Keep Reading