Tackle flaky tests

Recognize the patterns behind flaky tests and apply the right tool for each.

A test is flaky when it passes on one run and fails on the next without the code under test changing. Flake is rarely random. Most of the time it's a race the test happens to lose, and the trick is recognizing which race so you can reach for the matching tool.

How to spot a flaky test

Before you can fix flake, you have to reproduce it on demand. A test that fails one run in fifty slips through a green PR and turns up months later as a "weird CI thing" no one can pin down.

My favorite trick is to run the test in a tight loop with --repeat-each:

npx playwright test tests/cart.spec.ts --repeat-each=20

That executes every test in the file 20 times back-to-back. If it's racy, you'll usually see the failure within the first few iterations — and from then on you have a reliable signal for whether your fix actually worked.

A couple of knobs that pair well:

  • --retries=0 so a single retry doesn't hide the failure you're trying to surface.
  • --workers=1 when you suspect tests are stepping on each other instead of being independently flaky.
  • The HTML report's flaky marker. A test that fails on attempt 1 and passes on attempt 2 is flagged as flaky — not green. If you only watch the CI exit code, you'll never see the ones you already have.

Once you can reliably reproduce the failure, the toolkit below has something to bite into.

Todo

Why tests get flaky

Three common causes, in roughly the order I see them:

Hydration race

The DOM is painted, the test clicks, but the JS handler isn't wired up yet. The click fires into nothing.

FixRetry the action with expect().toPass().
Network in flight

An AJAX response lands mid-test and overwrites a value you just filled, or shifts the layout under your click.

FixWait on the UI signal with a web-first expect().
Unexpected overlay

A newsletter popup, cookie banner, or interstitial slides in on some runs and catches the click meant for your button.

FixRegister a page.addLocatorHandler().

The right fix depends on which one you're hitting — each card points to the matching tool in the toolkit below.

"Flaky Fixes", in priority order

1. Web-first assertions

As explained earlier, web-first assertions retry until they pass or time out. Web-first assertions and Playwright's auto-waiting are the number one help to avoid flakiness.

Avoid

Captures the value at one moment in time. The assertion can't wait for the heading to settle.

const text = await page.locator("h1").textContent();
expect(text).toBe("Welcome");
Avoid

Avoid hard coded timeouts, they're unpredictable and will fail randomly.

await page.waitForTimeout(2_000);
Prefer

Retries up to the configured timeout, so a slow render still passes without a sleep.

await expect(
  page.getByRole("heading", { level: 1 })
).toHaveText("Welcome");

// or
await expect(
  page.getByRole("heading", { level: 1, name: "Welcome" })
).toBeVisible()

Most common flake disappears the moment you stop reading values synchronously and let Playwright auto-wait. The shop intentionally includes spots where elements appear with a delay. If you stick to auto-waiting and web-first assertions, those just work.

2. expect().toPass() — retry the whole block

Sometimes a click itself is racy: the button is there, Playwright clicks it, but the handler wasn't attached yet so nothing happens. This issue is almost always based on poor frontend hydration patterns and very noticeable under slow network conditions.

Order summary
$79.99
behind the curtain

Click “Place order” to start the hydration race.

DOM painted
ready
JS hydration
10s
attempts 0landed 0

If the frontend can't be improved the only solution is to retry things like a human. If you click something and nothing happens, you retry after five seconds, too.

Retry the click and the assertion together.

// retry the click and assertion until they work...
await expect(async () => {
  await page.getByRole("button", { name: "Review order" }).click();
  await expect(page).toHaveURL(/review/);
}).toPass({ timeout: 10_000 });

toPass is the right answer when the action needs to be retried, not just the assertion. Hydration races are the canonical case.

3. expect.poll() — for values Playwright can't see

When the value you care about isn't in the DOM like an API response, something in placed in localStorage, or a window-level variable — expect.poll calls your function on an interval and runs a single matcher against the result.

// wait for this entry to respond with a 200
await expect
  .poll(async () => (await fetch("/api/health")).status, { timeout: 10_000 })
  .toBe(200);

4. page.addLocatorHandler() — for overlays you don't control

Sometimes a newsletter popup, a cookie banner, or a "you've been logged out" dialog appears on the page in the middle of an unrelated test. The test wasn't expecting it, the click lands on the dialog instead of your button, and the test fails.

addLocatorHandler registers a one-off cleanup: whenever Playwright is about to perform an action and finds the overlay in the way, it runs your handler first.

await page.addLocatorHandler(
  page.getByText("Subscribe for 10% off"),
  async () => {
    await page.getByRole("button", { name: "No thanks" }).click();
  },
);

The handler only fires when needed. If the overlay never appears, the handler never runs.

5. Project retries + flaky markings

In playwright.config.ts:

export default defineConfig({
  retries: process.env.CI ? 2 : 0,
});

Pattern: 0 locally, 2 in CI. You see flake during development. CI doesn't go red on the first hiccup. If a test fails on attempt 1 and passes on attempt 2, the HTML report marks it flaky — that's a signal to investigate, not a green tick.

Warning

Retries hide flakiness, they don't fix it. Treat any test that needs retries as a bug to investigate. It's tough, but better than a test nobody looks at anymore.

Anti-patterns

await page.waitForLoadState('networkidle') — the lying check

networkidle waits for the network to stop. On a real page with analytics, A/B testing, polling fetches, or anything else that talks to the server periodically, the network never goes idle — and on a quiet page, the network goes idle long before the UI is ready.

Additionally, waiting for the network to become quiet isn't user-first. Do you wait for all the JS files to be loaded before you click a button? I doubt it.

Avoid

Waits for the network to be quiet for 500 ms — a signal that rarely matches the user-visible state.

await page.waitForLoadState("networkidle");
await page.goto("/", { waitUntil: "networkidle" })
Prefer

Waits for the UI signal the user would wait for — the confirmation heading appearing.

await expect(
  page.getByRole("heading", { name: "Order confirmed" })
).toBeVisible();

Trusting your hardware

A test that passes on your laptop and fails in CI isn't a CI bug. CI is slower, more contended, and runs cold. If the test only passes when you have eight cores to spare, the test is the problem.


Hands on

Practice fighting flake

This shop has a ?chaos query parameter that turns on real-world friction:

  • a resource that loads for ages
  • a newsletter overlay that interrupts navigation
  • a login submit that doesn't work on first click

Copy your existing login test and change your initial goto('/') call to goto('/?chaos') and try to make it faster and stable.

Exercise 1 of 2

Add a locator handler for the newsletter overlay

  1. Open the workshop store with ?chaos enabled
  2. Try to run your add to cart tests
  3. Add page.addLocatorHandler(...) to the test (or to a beforeEach) that dismisses the overlay by clicking No thanks.
  4. Re-run the test — it should pass without you having to manually wait for or click the overlay.
Exercise 2 of 2

Stabilize the flaky login with toPass

  1. The login form on ?chaos URLs drops some submit clicks. The button click "succeeds" but nothing happens. Another click does the trick.
  2. Make the test still work!
  3. Run the test 10 times in a row with --repeat-each=10 and confirm it's stable.