Browser as a feedback signal

Updated May 29, 2026

Screenshots, console errors, and DOM refs are how the agent learns the UI is wrong.

Why the browser belongs in the harness

Code-only feedback — unit tests, type checks, linters — catches the bugs the code knows about. The UI fails for different reasons. A button overlaps its label by two pixels. The focus ring lands on the wrong element after a modal closes. A dark mode background reads as black on black. None of those break a test. The agent ships them confidently because nothing in the tool surface said otherwise.

Without sight, the agent cannot tell. Giving it eyes is not a nice-to-have; it is the difference between a harness that catches UI regressions and one that does not. agents browser is the binary that turns the page the user sees into bytes the agent can read.

Create a profile

agents browser profiles create research --browser chrome

A profile is a named, persistent browser identity. Cookies, login state, and installed extensions stay between runs. Sign into the sites you want the agent to reach once; every subsequent run inherits the session. Two profiles run in parallel without sharing state.

Start a task and navigate

agents browser start --profile research --task my-task
export AGENTS_BROWSER_TASK=my-task
agents browser navigate --url https://example.com/dashboard

A task is a scoped, named browsing session bound to one profile. The AGENTS_BROWSER_TASK environment variable tells every subsequent agents browser subcommand which task to address. navigate loads a URL in the task's active tab and waits for the document to settle.

Take a screenshot the agent can see

agents browser screenshot
agents browser screenshot --quality raw

The default screenshot is a JPEG capped at roughly 100 KB. That size is deliberate: it stays inside one image-token budget for the model, keeps round-trip latency under a second, and is more than enough resolution for layout review. --quality raw returns a pixel-faithful PNG when the agent is debugging anti-aliasing, font hinting, or sub-pixel alignment and the lossy version would hide the bug.

Read console errors

agents browser console --level error

Returns structured console logs since task start, filtered by level. Each entry is a line with timestamp, level, message, and source URL. The agent quotes them directly into its plan — no copy-paste from a screenshot of devtools, no hallucinated stack frames.

DOM refs for interaction

agents browser refs
agents browser click 42
agents browser type 15 --text "search query"

refs returns a numbered list of every interactive element on the page — links, buttons, inputs, focusable controls — with the visible label and the element role next to each number. The agent addresses elements by number. It never has to write a CSS selector. Selectors are brittle — class names change, structures get refactored, attribute orders flip. Refs are stable for the lifetime of the page.

Engineering the harness with this

The harness lesson is the same as every other surface in agents-cli: write the rule once, in the place the agent reads at the start of every run. Add to AGENTS.md:

When you change UI, take a screenshot of the affected route
with `agents browser screenshot` and quote the file path in your
final reply. A claim of "UI updated" without a screenshot is
unverified and will be rejected.

The agent will take screenshots forever. The reviewer no longer asks “does it look right?” The screenshot answers first. That is the harness: one rule, written once, that makes the failure mode structurally impossible.