I used to think that brittle e2e tests were mainly a code hygiene issue. Use accessibility-based locators, abstract implementation details, give everything a readable name, and the test will last.
It took me longer than I'd like to admit to realize that readability and durability are entirely different problems.
A test can be elegant, well-structured, and still collapse the moment the UI shifts. The issue was not how the test looked, but what layer of the system it was anchored to.
Every test is an attempt to write a contract for a promise the system must uphold. Problems begin when the contract requires more from the system than what's necessary to fulfill that promise.
When the surface the test is tied to changes at a different pace than the promise it tries to guarantee.
That is where a lot of hidden entropy enters e2e suites.
Named for the Business, Anchored to the UI
I think we’ve all seen tests like this one.
There is nothing special about it. It’s OK. It builds on accessible locators and uses retrying assertions. It walks through a business flow: create a business party and confirm the result. Reasonable enough.
But read it carefully. What conditions does it actually require to pass?
test('create business party', async ({ page }) => {
const partyList = page.getByTestId('Components.PartyList');
await partyList.getByRole('button', { name: /add party/i }).click();
const modal = page.getByTestId('Components.PartyModal');
await modal.getByRole('button', { name: /business/i }).click();
const entityName = modal.getByTestId('Components.PartyModal.PartyModalBusinessForm.entityName');
await entityName.getByRole('combobox').fill('Acme Inc.');
await entityName.getByRole('option', { name: /create/i }).click();
await modal.getByRole('textbox', { name: /address/i }).fill('123 Main St');
await modal.getByTestId('Components.PartyModal.submitButton').click();
await expect(partyList.getByTestId('Components.PartyList.PartyRow').filter({ hasText: 'Acme Inc.' })).toBeVisible();
});Nothing feels wrong while you're writing it. But once it starts to fail, and you have to change it, you realize how much it assumes about the current UI.
Which locators identify the modal. How the combobox flow works and where you can find the new party after creation completes.
When any of those details change, the test does not necessarily fail in a useful way. Sometimes it breaks even though the feature still works. Either way, it starts making demands that have nothing to do with the behavior you were trying to protect.
That misalignment shows up in symptoms that are probably familiar:
Brittle - The contract is entangled with details that change more frequently than the promise itself.
Flaky - The test does not clearly define the conditions that mark an action safe to start or successfully completed.
Repetitive - The same knowledge is scattered across many places.
Unreadable - Intent is buried beneath mechanics.
Meanwhile, the test still claims to protect business behavior. But what it actually protects is much harder to name.
Quieter Mechanics, Same Contract
A common first improvement is to clean up the test.
The raw script becomes more readable. Repeated steps get grouped. The test starts to express something closer to intent.
This is the kind of shift I mean:
test('create business party', async ({ page }) => {
const partiesPage = new PartiesPage(page);
await partiesPage.addPartyButton.click();
const modal = partiesPage.partyModal;
await modal.businessTab.click();
await modal.companyNameCombobox.addNew('Acme Inc.');
await modal.address.fill('123 Main St');
await modal.submitButton.click();
await expect(partiesPage.partyRows.filter({ hasText: 'Acme Inc.' })).toBeVisible();
});This one is cleaner. Mechanical entropy goes down. The reader no longer has to reconstruct meaning from a stream of clicks and locators.
But it is not the whole story.
Reducing the entropy of how a test operates does not automatically settle what kind of contract it is encoding.
This test is still making UI-level assumptions. Even though it no longer cares about specific markup, it still cares about the interface itself. A business is selected through the modal and a combobox. The resulting row appears.
That is a more stable contract. But it is still a UI contract.
What Survives UI Changes
A different contract becomes visible once the scope moves beyond the UI.
An application-scope test focuses on something else entirely. Not how the interface works, but that a business party can be created with an external signatory, and that the resulting entity exists in the system with the expected relationship intact. The UI becomes just one way to reach that outcome.
Here is what the same flow looks like when the contract is application-scope:
test('create business party', async ({ page }) => {
const parties = new Parties(page);
await parties
.addBusiness({ companyName: 'Acme Inc.', address: '123 Main St' })
.create();
await expect.poll(async () => parties.get('Acme Inc.')).not.toBeUndefined();
});That contract should hold regardless of the interface, whether a user reaches it through the UI, through an external API, through an MCP server, or through any other surface the application exposes, the underlying outcome is the same.
Neither test is universally better. They just protect different things.
When writing a test, the first question is not how to automate it.
It is what promise the test is supposed to protect, and when you would expect that promise to change.
A UI-scope contract should change when the UI behavior changes.
An application-scope contract should change when the application capability itself changes.
A failing pipeline is not the enemy, but after too many false alarms, it can feel like one. When promise and contract are aligned, that changes. It stops being noise and starts being useful: red becomes less noisy, less arbitrary, and much easier to interpret.
Scope alignment does not mean a red test always signals product breakage. Mechanics can still break. A translation layer can stop observing the system correctly. But if the promise is still intact, the repair stays local to the mechanics. The test itself does not need to be rewritten just because the system found a new way to express the same thing.
What Scope Alignment Unlocks
One could say: "Ok. So tests should read like their title. Then why go through all the trouble of refactoring when you could have just renamed the test?"
I get the feeling - we humans like going toward the least resistance. Rename the test, and it reads more honest. The scope of mechanics and promise would be aligned. But would that be a promise you are comfortable making? Would you say that verifying the "add signatory" button opens the "signatory form" inside the "party modal" justifies bringing up the whole application environment?
The real win is not naming alignment. It is the lens it creates. Once the contract is explicit, the important decisions automatically surface.
A UI-scope test can stay focused on UI behavior.
An application-scope test can stay focused on the capability itself.
A targeted API test can stay focused on service behavior.
Different tests stop competing with each other, because they are no longer all trying to stabilize the same mixed bundle of concerns.
That also reduces the reasons a test should fail.
A detailed UI test that is intentionally independent from backend behavior should mostly fail for UI regressions.
A business capability can still be protected elsewhere through a higher-level end-to-end path or a more targeted API test.
Once the contract is aligned with the promise, the suite becomes much easier to shape around the product's real priorities, constraints, and pain points.
It becomes a lot easier to create fast, isolated feedback loops that both humans and agents can hugely benefit from.
Expensive, high-confidence checks stay focused on the few capabilities that actually deserve them.
And questions about mechanics become more strategic: what do we want to protect through the UI, what through an API, and how much mechanical fragility are we willing to tolerate for each?
These are the foundations that enable you to build e2e suites that match the product's real priorities, constraints, and pain points.
If you have tests that constantly seem to change for the wrong reasons, do yourself a favor. Take a step back and ask:
What promise is this test supposed to protect?
Is the contract operating at the same scope as that promise?
What surface is the contract tied to? What would cause this test to change, and how often?
Is that maintenance cost worth this promise, or should I protect a higher-level promise until the system solidifies?
Trust me, these questions look heavier than they actually are. Once the promise is clear, they kinda answer themselves.

