The agent transcribes the flow and picks the assertions — Playwright codegen only records clicks. Read the diff and ship.
Drive a flow, create tests and debug
Have your coding agent turn a CLI-driven session into a runnable Playwright spec — and understand exactly what it's doing under the hood.
You watched an agent run a browser session in the previous lesson — open, search, click around. Useful, but no test came out of it. Now we'll extend that flow with an add-to-cart step and turn the transcript into a runnable spec.
As you've seen, every playwright-cli action emits the equivalent Playwright TypeScript. If you drive a flow once, all the information to generate a test is available.
However, there's one caveat: the CLI emits actions only — assertions are your (or your agent's) job.
A complete cycle, prompted
Open the agent of your choice in your workshop project and prompt:
Open https://www.playwright-workshop.online/, search for snowboard, click the first product,
add it to the cart and validate the cart. Generate a new Playwright test:
tests/snowboard.spec.ts. Use playwright-cli to drive the browser. Add
assertions that matter for the buyer.
The agent reads the skill, drives the flow with playwright-cli, and collects every emitted await page.… line into a spec. Yours will look different — but it'll have roughly this shape:
import { test, expect } from "@playwright/test";
test("adds a snowboard to the cart from search", async ({ page }) => {
await page.goto("https://www.playwright-workshop.online/");
await page
.getByRole("searchbox", { name: "Search products" })
.fill("snowboard");
await page.keyboard.press("Enter");
await page.getByRole("link", { name: "All Mountain Snowboard" }).click();
await page.getByRole("button", { name: "Add item to cart" }).click();
await expect(page.getByLabel("Open cart")).toContainText("1");
});
Run it (or let you agent run it):
$ npx playwright test tests/snowboard.spec.ts
Is this actually a good test?Select this container via test id: judge-the-generated-test
Your generated test will look different! Stop and read your generated spec as if it landed in a PR for review. The CLI drove the browser, the agent picked the locators and the assertion.
But you'll always be the one who decides whether this ships.
Find things you'd push back on before approving it.
A few places to look:
- The locators. Does each one uniquely identify the element you mean, or is it "whichever happens to match first"?
- The correct assertions. Does
toContainText("1")on the entire Open cart button actually prove a snowboard is in the cart and are the correct web-first assertions in use? - What's missing. Did the user reach a product page? Is the cart's content asserted, or only its counter?
The point isn't that the agent was wrong. It did exactly what the prompt asked. The agent ships drafts. You ship tests. Reviewing what comes back is essential. You're still in charge!
FYI: when the page snapshot doesn't give your agent enough to anchor an assertion, the CLI has two utilities worth knowing about:
playwright-cli generate-locator <ref>emits the locator expression you'd hand toexpect(...).playwright-cli eval "el => el.dataset.testid" <ref>reads the attributes the snapshot strips out (id,class,data-*). Useful when a generic in the YAML doesn't have a name to key off.
When to reach for AI generation — and when not
Your agent hands you a spec draft. Whether to take it depends on what you're testing — and on whether the run was worth the spend.
Every run burns seconds to minutes of wall time and tokens billed to your API key. Spend that budget where the agent pays it back.
A deterministic recorder, no API call. You drive every line and add the assertions where they matter.
waitForTimeout; you won'tTotally up to you. Workshops won't tell you the right answer here.
Earn trust slowly. Hand-write or drive from playwright codegen. Let the agent only write the tests you'd otherwise skip to save time.
Let the agent draft most specs. You step in for critical paths and edit what doesn't fit. You know how easy it is to lose control.
But! Always review the outcome. Always. Always. Always! You're still in charge!
The bar is the same as code review: would you ship it as-is? If yes, ship. If no, you have a starting point — keep editing until you would.
When generation goes sideways
Real pages bite. Locators match three things, an overlay eats a click, the assertion fires a beat before the cart updates. When the generated test runs red — or the agent stalls mid-flow — the lightweight move is to hand the problem back with the same toolbox that built it.
tests/snowboard.spec.ts fails on the cart assertion. Use playwright-cli to
drive the page, find what's actually there, and propose a fix.
That prompt alone gets you surprisingly far. The agent re-attaches a browser and works the page directly:
playwright-cli snapshot— what's actually rendered, with the same refs the spec would use.playwright-cli console— app-side errors the test never surfaced.playwright-cli requests— did a request fail, return the wrong shape, or land too late?playwright-cli show --annotate— when the agent can't find something on its own, it can ask you to point at it.
Replaying the failing step emits the corrected await page.… line — that's what gets pasted back into the spec. No new vocabulary, no special mode. Agents are pretty good at figuring this stuff out these days — hand them the tools and the outcome, they iterate.
--debug=cli only really means two things running at once. The runner pauses your test and prints a session name (e.g. tw-abcdef). The agent uses that name to attach into the paused page.
npx playwright test --debug=cliRuns in the background, paused at the first line of your failing test. Holds everything the test had wired up and prints a session name when it pauses.
playwright-cli attach tw-abcdefThe agent's session, joined to the paused test by the session name. Same toolbox as anywhere else, but it drives the test's actual page.
npx playwright test --debug=cli is the fancy version. It pauses your failing test, prints a session name, and lets the agent playwright-cli attach into a paused page with the test's fixtures, storage, and route mocks already applied. Promising on paper. In my hands, the harness is flaky enough that I keep falling back to "just tell the agent to use the CLI." Try it; your mileage may vary.
Go with the flow and generate more tests
Generate a new test for Chaos mode
Re-run the snowboard prompt, but this time point the agent at the workshop store with chaos enabled:
Open https://www.playwright-workshop.online/?chaos, search for snowboard, click the first
product, add it to the cart and validate the cart. Generate a Playwright test
at tests/snowboard-chaos.spec.ts. Use playwright-cli to drive the browser.
?chaos enables real-world friction — a newsletter overlay pops on the homepage. The agent has to deal with all of it without your help. Will it manage?
Skim the playwright-cli skill and references
Open the Playwright CLI skill (e.g. .claude/skills/playwright-cli/SKILL.md) in your editor. You don't have to memorise it; your agent will. But scan the section headers (Core, Navigation, Storage, Network, DevTools…) so you know what's there when you write the next prompt. Spot the commands you've already seen the agent use, and the ones you haven't.
Take special care and check the reference files (e.g. references/test-generation.md). Ideally, the agent loads these on demand but sometimes you need to point it there.