Drive a flow, create tests and debug

Have your coding agent turn a CLI-driven session into a runnable Playwright spec — and understand exactly what it's doing under the hood.

You watched an agent run a browser session in the previous lesson — open, search, click around. Useful, but no test came out of it. Now we'll extend that flow with an add-to-cart step and turn the transcript into a runnable spec.

As you've seen, every playwright-cli action emits the equivalent Playwright TypeScript. If you drive a flow once, all the information to generate a test is available.

However, there's one caveat: the CLI emits actions only — assertions are your (or your agent's) job.

A complete cycle, prompted

Open the agent of your choice in your workshop project and prompt:

Open https://www.playwright-workshop.online/, search for snowboard, click the first product, add it to the cart and validate the cart. Generate a new Playwright test: tests/snowboard.spec.ts. Use playwright-cli to drive the browser. Add assertions that matter for the buyer.

Prompt for your coding agent

The agent reads the skill, drives the flow with playwright-cli, and collects every emitted await page.… line into a spec. Yours will look different — but it'll have roughly this shape:

import { test, expect } from "@playwright/test";

test("adds a snowboard to the cart from search", async ({ page }) => {
  await page.goto("https://www.playwright-workshop.online/");

  await page
    .getByRole("searchbox", { name: "Search products" })
    .fill("snowboard");
  await page.keyboard.press("Enter");

  await page.getByRole("link", { name: "All Mountain Snowboard" }).click();
  await page.getByRole("button", { name: "Add item to cart" }).click();

  await expect(page.getByLabel("Open cart")).toContainText("1");
});

Run it (or let you agent run it):

$ npx playwright test tests/snowboard.spec.ts
Inline exercise

Is this actually a good test?Select this container via test id: judge-the-generated-test

Your generated test will look different! Stop and read your generated spec as if it landed in a PR for review. The CLI drove the browser, the agent picked the locators and the assertion.

But you'll always be the one who decides whether this ships.

Find things you'd push back on before approving it.

A few places to look:

  • The locators. Does each one uniquely identify the element you mean, or is it "whichever happens to match first"?
  • The correct assertions. Does toContainText("1") on the entire Open cart button actually prove a snowboard is in the cart and are the correct web-first assertions in use?
  • What's missing. Did the user reach a product page? Is the cart's content asserted, or only its counter?

The point isn't that the agent was wrong. It did exactly what the prompt asked. The agent ships drafts. You ship tests. Reviewing what comes back is essential. You're still in charge!

Tip

FYI: when the page snapshot doesn't give your agent enough to anchor an assertion, the CLI has two utilities worth knowing about:

  • playwright-cli generate-locator <ref> emits the locator expression you'd hand to expect(...).
  • playwright-cli eval "el => el.dataset.testid" <ref> reads the attributes the snapshot strips out (id, class, data-*). Useful when a generic in the YAML doesn't have a name to key off.

When to reach for AI generation — and when not

Your agent hands you a spec draft. Whether to take it depends on what you're testing — and on whether the run was worth the spend.

Generation isn't free

Every run burns seconds to minutes of wall time and tokens billed to your API key. Spend that budget where the agent pays it back.

AI codegen
hand off to AI

The agent transcribes the flow and picks the assertions — Playwright codegen only records clicks. Read the diff and ship.

Pages with semantic markup
stable roles and labels give the agent something to anchor on
Many similar tests
once one spec exists, the agent pattern-matches the next twenty
Refactor scaffolding
throwaway specs that disappear in a week
Prompt it, read it, ship it, move on.
Playwright codegen
generate yourself

A deterministic recorder, no API call. You drive every line and add the assertions where they matter.

Critical paths
payments, auth — you want to see and edit every line
Pages thin on semantics
agents guess; codegen records exactly what you click
Flake
agents paper over timing with waitForTimeout; you won't
Run playwright codegen, then adjust by hand.
Which one should I choose?

Totally up to you. Workshops won't tell you the right answer here.

You have low AI trust?

Earn trust slowly. Hand-write or drive from playwright codegen. Let the agent only write the tests you'd otherwise skip to save time.

You have high AI trust already?

Let the agent draft most specs. You step in for critical paths and edit what doesn't fit. You know how easy it is to lose control.

But! Always review the outcome. Always. Always. Always! You're still in charge!

The bar is the same as code review: would you ship it as-is? If yes, ship. If no, you have a starting point — keep editing until you would.

When generation goes sideways

Real pages bite. Locators match three things, an overlay eats a click, the assertion fires a beat before the cart updates. When the generated test runs red — or the agent stalls mid-flow — the lightweight move is to hand the problem back with the same toolbox that built it.

tests/snowboard.spec.ts fails on the cart assertion. Use playwright-cli to drive the page, find what's actually there, and propose a fix.

Prompt for your coding agent

That prompt alone gets you surprisingly far. The agent re-attaches a browser and works the page directly:

  • playwright-cli snapshot — what's actually rendered, with the same refs the spec would use.
  • playwright-cli console — app-side errors the test never surfaced.
  • playwright-cli requests — did a request fail, return the wrong shape, or land too late?
  • playwright-cli show --annotate — when the agent can't find something on its own, it can ask you to point at it.

Replaying the failing step emits the corrected await page.… line — that's what gets pasted back into the spec. No new vocabulary, no special mode. Agents are pretty good at figuring this stuff out these days — hand them the tools and the outcome, they iterate.

Two processes, one session

--debug=cli only really means two things running at once. The runner pauses your test and prints a session name (e.g. tw-abcdef). The agent uses that name to attach into the paused page.

Test runner
npx playwright test --debug=cli

Runs in the background, paused at the first line of your failing test. Holds everything the test had wired up and prints a session name when it pauses.

Fixtures applied
beforeEach / beforeAll have already run before the agent shows up
Storage state loaded
cookies, localStorage, signed-in user — the test's real view of the world
Route mocks active
page.route() handlers keep firing while the agent drives the page
Stays paused. Stop it when you're done.
Agent CLI
playwright-cli attach tw-abcdef

The agent's session, joined to the paused test by the session name. Same toolbox as anywhere else, but it drives the test's actual page.

snapshot
what the paused page renders, with the test's data already loaded
console / requests
errors and traffic the assertion never surfaced before it failed
click / fill / press
every action emits the await page.… line you paste back into the spec
Iterates against the test's real context, not a fresh tab.
Warning

npx playwright test --debug=cli is the fancy version. It pauses your failing test, prints a session name, and lets the agent playwright-cli attach into a paused page with the test's fixtures, storage, and route mocks already applied. Promising on paper. In my hands, the harness is flaky enough that I keep falling back to "just tell the agent to use the CLI." Try it; your mileage may vary.

Hands on

Go with the flow and generate more tests

Exercise 1 of 2

Generate a new test for Chaos mode

Re-run the snowboard prompt, but this time point the agent at the workshop store with chaos enabled:

Open https://www.playwright-workshop.online/?chaos, search for snowboard, click the first product, add it to the cart and validate the cart. Generate a Playwright test at tests/snowboard-chaos.spec.ts. Use playwright-cli to drive the browser.

Prompt for your coding agent

?chaos enables real-world friction — a newsletter overlay pops on the homepage. The agent has to deal with all of it without your help. Will it manage?

Exercise 2 of 2

Skim the playwright-cli skill and references

Open the Playwright CLI skill (e.g. .claude/skills/playwright-cli/SKILL.md) in your editor. You don't have to memorise it; your agent will. But scan the section headers (Core, Navigation, Storage, Network, DevTools…) so you know what's there when you write the next prompt. Spot the commands you've already seen the agent use, and the ones you haven't.

Take special care and check the reference files (e.g. references/test-generation.md). Ideally, the agent loads these on demand but sometimes you need to point it there.