Lab Tooling and CI

Lab tooling runs the rendering pipeline under controlled, repeatable conditions so a frame-budget regression is caught in continuous integration instead of in production. Lighthouse CI, WebPageTest scripting, and performance budgets turn the metrics you observe in the field into hard pass/fail gates on every commit. This is part of Rendering Performance Metrics and Tooling, and it is the gate that catches the regressions the field instrumentation in PerformanceObserver API Patterns would otherwise only report after release.

Lab Versus Field

Field data — Core Web Vitals collected from real users through observers — is the source of truth for what users actually experience, but it arrives after release and is noisy with device and network variance. Lab data is synthetic: a fixed device profile, throttled CPU, and a simulated network, run on demand. The trade is reproducibility for realism. You use the lab to fail a pull request deterministically; you use the field to confirm the fix moved the distribution. The two are complementary, and the metrics line up — lab Total Blocking Time predicts field INP, lab CLS predicts field CLS.

tool	layer	best for
Lighthouse CI	synthetic audit	per-commit pass/fail on CWV-style metrics
WebPageTest	synthetic, scripted	multi-step flows, main-thread and long-task traces
Performance budgets	assertion layer	hard limits on metrics and resource bytes

Performance Budgets

A performance budget is a number a metric must not exceed, enforced as a build failure. Budgets come in two flavours: timing budgets (TBT < 200ms, LCP < 2.5s, CLS < 0.1) and resource budgets (script < 170KB, total < 1.6MB, request count < 50). Timing budgets guard the experience; resource budgets guard the cause, since bytes shipped is the leading indicator of main-thread work and therefore of long tasks. Both belong in CI so a 40KB dependency bump that pushes TBT past the frame budget is rejected at the pull request, not discovered in next week’s field data.

[Budget assertion on a regressing commit]
  metric        baseline   this build   budget    result
  TBT ........... 140ms      270ms       200ms     ✗ FAIL
  LCP ........... 2.1s       2.2s        2.5s      ✓ pass
  CLS ........... 0.04       0.05        0.10      ✓ pass
  script bytes .. 150KB      198KB       170KB     ✗ FAIL
  → CI exits non-zero, merge gate blocks

Lighthouse CI

Lighthouse CI wraps the Lighthouse audit engine for automation: it collects N runs (medianing to dampen variance), asserts the results against a config, and optionally uploads reports to a server for trend tracking. The assertions are where the budget lives — you declare each metric’s allowed maximum and the run fails if the median exceeds it.

// lighthouserc.js — the assertion config that turns an audit into a gate
module.exports = {
  ci: {
    collect: { numberOfRuns: 5 }, // median of 5 dampens CPU-throttle noise
    assert: {
      assertions: {
        'total-blocking-time': ['error', { maxNumericValue: 200 }],   // TBT budget
        'largest-contentful-paint': ['error', { maxNumericValue: 2500 }],
        'cumulative-layout-shift': ['error', { maxNumericValue: 0.1 }],
      },
    },
  },
}

The full configuration — including budget.json resource limits and the GitHub Actions wiring — is covered in Automating Lighthouse CI Performance Budgets.

WebPageTest for Frame-Level Detail

Lighthouse summarizes; WebPageTest dissects. Where Lighthouse gives you a TBT number, a scripted WebPageTest run gives you the full main-thread trace, a long-task breakdown, a filmstrip, and custom metrics you compute from the trace yourself — letting you assert directly against the 16.6ms frame budget on a specific interaction in a multi-step flow. Scripting also lets you measure pages behind login or deep in a funnel, which a single-URL audit cannot reach. The scripting language and the trace-extraction patterns are detailed in Scripting WebPageTest for Frame Budget Regressions.

Catching Frame-Budget Regressions Before Deploy

A regression worth gating is one where a frame’s main-thread work crosses 16.6ms and starts dropping frames. The CI flow that catches it:

Build the production bundle exactly as it ships.
Serve it locally and run Lighthouse CI five times against the target URLs.
Assert TBT, LCP, and CLS against the timing budget, and assert budget.json against the resource budget.
Run a scripted WebPageTest pass for any interaction-heavy flow and assert the extracted long-task total against the frame budget.
Exit non-zero on any failed assertion so the merge gate blocks.

[CI run on a pull request — frame-budget regression caught]
  step 1 build ................... ok
  step 2 lhci collect (5 runs) ... median TBT 270ms
  step 3 lhci assert ............. ✗ total-blocking-time 270 > 200
  step 4 wpt longtask assert ..... ✗ longest task 92ms > 16.6ms budget
  step 5 exit 1 .................. merge BLOCKED

Metric Targets

metric	target	how measured
TBT	< 200ms	Lighthouse CI median of 5
LCP	< 2.5s	Lighthouse CI assertion
CLS	< 0.1	Lighthouse CI assertion
Longest task in a flow	< 16.6ms	WebPageTest trace extraction
Script transfer size	< 170KB	`budget.json` resource budget

With these gates in place, regressions surface on the pull request that caused them. The lab numbers asserted here are the same ones the field observers in Core Web Vitals Measurement confirm once the change reaches real users.