This website uses cookies

Read our Privacy policy and Terms of use for more information.

AI has made it cheap to express intent, explore ideas, and edit source code, but at the end of the change, the system has to settle into a state we can trust.

Let’s take a simple story: We ask the agent to change the signup form validation. It updates the form, the server-side validation logic, even the e2e test.
Code review is done, the change looks simple enough, tests pass, so we merge.

After some time, a customer reports they are unable to reset their password.
We investigate.
We ask the usual questions:

  • Who owns password reset?

  • Why did the review miss it?

  • Why did the test suite stay green?

Eventually, the blame lands on the agent. We create a hotfix, manually check signup and password reset, we release, and hope the fix didn't break something else.

Change got cheap. Confidence did not.

What do we actually want to know?

Agents have a weird tendency to feed us the false feeling of confidence: they write a long list of checkmarks and a plausible explanation. These artifacts describe the change from the agent’s perspective, but they do not clearly reflect what happened to the system.

What we actually want to know is usually more nuanced.

There’s the obvious, explicit part:

  • the agent changed what I asked it to change

  • it works

And then there is the broader, implicit part:

  • what became outdated was removed

  • conflicts were resolved, no contradictions remained

  • standards were followed, boundaries were respected

  • nothing else changed, the rest of the system remained operational

That second list is usually where confidence disappears.
Did the system move from one valid state to another, or did the agent satisfy the prompt while violating something outside the frame?

One might say the problem is the agent. In my opinion, it’s how we think about the system. Or more like how we don’t.

What is a software system made of?

Each piece of software is a set of requirements expressed through physical artifacts.

A high-level business requirement like “users can sign up with email and password” is not implemented in one place, it is satisfied through several parts of the physical system: the signup form, client-side and server-side validation, the API contract, authentication, the user entity, the related database schema, and so on.

Those pieces are shared by other requirements too.

Change the validation logic or the user schema shape, and you risk breaking signup, or any other requirement that shares the same implementation, whether you named it or not.

Here's a simplified view of our system, as requirements and the corresponding implementations:

Requirements live on the theoretical plane.

The physical plane contains the things we can actually control or observe: code, tests, schemas, rules, logs, traces, UI behavior, diffs and so on.

We humans, have access to the theoretical plane. An agent doesn’t. It can’t inspect a requirement directly. It inspects the places where that requirement touches the physical system.

There are also different kinds of requirements, which will come in handy later:

  • Some requirements describe a capability of the system — functional requirements.

    • A user can sign up with an email and a password

    • A user can reset their password

  • Other requirements describe a quality or constraint of the system — non-functional requirements.

    • UI components cannot import directly from persistence code.

    • Export completes under 30 seconds.

For the system to be coherent, all requirements have to be satisfiable together. Contradictions in the requirement set eventually surface as instability in the running system.

That is what happened with the password reset bug.

What an agentic change does

A change is a shift in the requirement set.

  • New requirements are added.

  • Obsolete requirements are removed.

  • Contradictions between requirements are resolved.

A good change moves the system from one coherent requirement set to another.

When we ask an agent to make a change, we describe a desired shift on the theoretical plane. Through natural language instructions, we aim to express the new truth of the system.
The agent edits code, hoping it matches what we asked for.

A detailed breakdown of how the signup change happened:

  • First, the agent updated the signup form

  • An e2e test checking the signup flow broke

  • Then it updated the e2e test to check the new password validation requirements

  • e2e test still breaks with a server-side validation error

  • It updated password validation on the BE to meet the new requirement

  • All tests pass. Production-ready.

The server-side and client-side validators shared an implicit connection that affected both signup and password reset. The agent didn't notice. Neither did we.

During code review, we try to project the physical diff back onto the requirement set. 
A perfect review would map every changed file to the affected requirements. In practice, you only see a fraction of that:

  • requirements named in the task

  • requirements noticed during review

  • requirements covered by automatic checks

That gives us a useful risk map, a way to examine requirements by their visibility and level of protection.

Contractualized

Non-contractualized

Explicit

named in the task and automatically checked

named in the task, but manual review has to verify it

Implicit

not named in the task, but covered by an automatic check

maybe affected by change, but neither named nor checked

The top row is the visible part of the review. If the requirement was named in the task, we at least know to look for it. If it is also contractualized, an automatic check can signal when it breaks. (FE Signup password validation)

The bottom-left quadrant is where automation gives us the unexpected edge. The requirement was not named, but an existing check can still protect us from hurting it. (BE Signup password validation)

The bottom-right quadrant is the blind spot. The changed implementation participates in requirements nobody named, and no automated check protects. (Password Reset) Left unchecked, this blind spot grows into a black hole, slowly eating away at our product.

I know this may sound like a lot. Theoretical plane. Physical plane. Explicit requirements. Implicit requirements. Blind spot. But hang with me, because this is where it starts to pay off in practice.

To stop the blind spot from growing, we must build mechanisms that protect the requirements we value. That is where convergence mechanisms come in: bridging the theoretical and the physical, closing the feedback loop.

Convergence Mechanisms: What actually protects requirements

A requirement is only protected when we have a mechanism to evaluate whether it still holds.

convergence mechanism interprets physical evidence against requirements so we can judge whether the system still satisfies them.
Different requirements need different evidence. The table below provides a rough map of tools one can use for evaluating them. These are the physical building blocks of our convergence mechanisms.

Requirement

Static evaluation

Runtime evaluation

Functional

types, schemas, API contracts

unit tests, integration tests, e2e tests, walkthroughs

Non-functional

boundaries, dependency graphs, architecture rules, lint rules, code shape

monitoring, alerts, error rates, traces, logs, production behavior

I think of a convergence mechanism as four parts: the requirement, the evidence, the interpreter, and the feedback.

The requirement might come from a spec, an API contract, or a user story. The evidence might come from a diff, a trace, or runtime behavior. The interpreter might be a linter, a test runner, or a reviewer reading through the change. The feedback might be a failure, a warning, or simply a green checkmark.

Once we separate those parts, the strength of a mechanism becomes easier to see.
A mechanism is stronger when:

  • the requirement representation is explicit

  • the observed surface is close to reality

  • the interpreter is reliable

  • the feedback is unambiguous

Unwritten rules are fragile because the requirement is only present in our memory.

Written guidance is somewhat stronger because the requirement is externalized. But it can still be skipped, outdated, or contradicted without anyone knowing.

Executable contracts are the most reliable because the requirement is explicit, the interpretation is automated, and the feedback is deterministic.

They also make contradictions between requirements harder to hide: two tests making incompatible promises, conflicting lint rules, or a type mismatch will result in a clear error.

The stronger the mechanism, the less confidence depends on someone remembering the requirement at the right moment.

The Convergence Prism - How to Harden Your Convergence Mechanisms

Our requirements live at the center, surrounded by the mechanisms that keep the physical system aligned.
The upper half is the local agentic harness.
The lower half is where human judgment does the heavier lifting.

Hardening strengthens the mechanisms around the requirements.
It moves in two directions: upward, toward more executable feedback, and inward, closing the distance between the mechanism and the requirement it protects.

A mechanism that runs after the change is done can judge the work. One that runs as part of the agentic loop can steer it.

Static rules, type checks, and unit tests run inside the agent loop. E2E tests run in CI, before merge. Monitoring gives runtime feedback, but it sits outside the local harness.

Adding more guidance can feel like hardening, but guidance without feedback is another surface where entropy accumulates. Docs and skills can drift from the code, contradict other instructions, or preserve decisions that are no longer current.

The more guidance you add without connecting it to the feedback loop, the more the agent has to guess instead of verify.

The harness needs to do three things for the agent:

  • find the source of truth

  • run the relevant feedback loop

  • know what still requires human judgment

Executable files should carry the truth.
Agent guidance should route, not duplicate.
Agent instructions are not a diary: explain current constraints only, conserve meaningful decisions in ADRs.
Human docs handle rationale, onboarding, and everything that cannot be expressed in code.

Here is what that looks like in practice.

repo/
  package.json                         # commands the agent can run
  tsconfig.json                        # TypeScript constraints
  eslint.boundaries.config.js          # architecture / dependency rules
  eslint.code-style.config.js          # style and local code rules
  prettier.config.js                   # formatting rules
  playwright.config.ts                 # e2e configuration and test discovery
  vitest.config.ts                     # unit test configuration and test discovery

  README.md                            # human-facing onboarding
  docs/
    adr/                               # human-facing rationale and decisions

  AGENTS.md                            # thin router / index
  .agents/
    architecture.md                    # architectural intent + links to checks
    code_style.md                      # style intent + links to checks
    testing.md                         # when to run which feedback loop + links to checks
    skills/                            # optional, extract specific skills here if main guidance artifacts start to bloat
      review/
        SKILL.md

  src/
    feature_x/                         # feature slices, organized into horizontal layers according to your boundary rules
      ports/
      adapters/
      domain/
      application/
      presentation/
        components/
        hooks/
        view_models/
      utils/
      AGENTS.md                        # optional, only for current feature-specific constraints that cannot be codified
    shared/                            # general-purpose utilities, following the same layering rules

There's no right or wrong source code organization inside feature slices. You can use MVC, MVP, hexagonal architecture, or whatever you want as long as you're consistent. Agents are pattern-matching machines. The best thing you can do for them is to be predictable.

AGENTS.md

# Agent instructions

Use this file as an index.

The source of truth lives in code, tests, schemas, rules and configs.

## Where to look

- Architecture and boundaries: `.agents/architecture.md`
- Code style: `.agents/code_style.md`
- Testing guidance: `.agents/testing.md`
- Repeatable workflows: `.agents/skills/`
- Features: read `.agents/architecture.md` to understand where features are located.

## Before changing code

1. Identify the feature slice affected by the task.
2. Identify the requirements likely affected by the change.
3. Prefer the smallest coherent change.
4. Run the relevant feedback loop based on testing, code style and architecture guidance:
   - Run type check.
   - Run lint.
   - Run tests for the affected slice.
   - Run boundary checks when imports or feature structure changed.
   - Run broader tests only when the change crosses slice boundaries.
   - If a check fails, treat it as feedback about a requirement. Do not ignore or loosen the check without human approval. 

## Rules
- Do not assume requirements, ask for clarification in case of doubt.
- If a requirement comes from documentation, check the source code to verify if it still holds before taking it for granted. 
- If contradictions arise, ask for human clarification.

.agents/architecture.md

# Architecture

This project is organized around vertical feature slices with horizontal layers inside the slices.

## Source of truth

- Horizontal layers and import boundaries: `eslint.boundaries.config.js`
- Architecture check command: `pnpm lint:boundaries`
- Shared types and schemas: source files under `src/shared`
- Understanding features and behavior: check e2e tests in  feature slices

## Current constraints that are not fully encoded

- Shared code should be introduced only when two or more slices need the same stable abstraction.

Towards Synergy

Agentic engineering will keep making change cheaper, but that does not make convergence automatic. It makes convergence our responsibility.

The ideal loop combines human judgment, agentic speed, and deterministic mechanisms that push back when something important stops holding.

Instead of expecting a fleet of agents to take care of quality, build the habit of hardening your convergence mechanisms one step at a time.
Write down an unwritten rule.
Move one stale instruction into a check.
Turn one repeated review comment into a test.
Point the agent to the source of truth instead of restating it.

Reduce the blind spot by one requirement. Over time, those steps add up.
The challenge is making the harness grow faster than the uncertainty your agents create.

Reply

Avatar

or to participate

Keep Reading