Frontend Testing Strategy at Scale: Confidence, Speed, and Trust in Large Codebases

A principal-level playbook for unit, integration, and e2e testing with Vitest, Testing Library, and Playwright—without a slow, flaky suite.

DEC 17, 2025 19 MIN READ

Why testing strategy matters more than tooling

At scale, the question is not “which test runner should we use?” but “how do we buy the most confidence per dollar of CI time and engineer attention?” {Ở quy mô lớn, câu hỏi không phải “nên dùng test runner nào?” mà là “làm sao đổi được nhiều confidence nhất cho mỗi đồng thời gian CI và sự chú ý của engineer?”}

A suite that takes 45 minutes and flakes twice a week is worse than a smaller suite that runs in eight minutes and fails only when behavior actually regressed. {Một suite chạy 45 phút và flake hai lần mỗi tuần còn tệ hơn suite nhỏ hơn chạy tám phút và chỉ fail khi hành vi thực sự regress.}

Senior and principal engineers are judged on systems: how tests are layered, how data is seeded, how flakiness is quarantined, and how PR feedback stays under ten minutes. {Senior và principal engineer được đánh giá qua hệ thống: cách phân tầng test, seed data, cách cách ly flake, và feedback PR dưới mười phút.}

This post is a pragmatic playbook—not a manifesto for 100% coverage or “no mocks ever.” {Bài viết này là playbook thực dụng—không phải tuyên ngôn 100% coverage hay “không bao giờ mock”.}

The trophy vs the pyramid: what actually wins

Kent C. Dodds popularized the testing trophy: heavy integration tests, moderate end-to-end (e2e), light unit, and a thin slice of static analysis. {Kent C. Dodds phổ biến testing trophy: integration test nặng, e2e vừa phải, unit nhẹ, và một lát mỏng static analysis.}

The classic testing pyramid inverts that mental model: many fast unit tests at the base, fewer integration tests, fewest e2e at the top. {Testing pyramid cổ điển đảo ngược mô hình đó: nhiều unit test nhanh ở đáy, ít integration hơn, e2e ít nhất ở đỉnh.}

Both diagrams are pedagogical shorthand; neither is a law of physics. {Cả hai sơ đồ chỉ là cách dạy ngắn gọn; không phải định luật vật lý.}

What matters is confidence per cost along three axes: speed, determinism, and signal quality (does a failure tell you what broke?). {Điều quan trọng là confidence trên chi phí theo ba trục: tốc độ, tính xác định, và chất lượng tín hiệu (fail có chỉ đúng chỗ hỏng không?).}

Layer	Typical runtime	Best signal for	Main failure mode
Static (TypeScript, ESLint, `tsc`)	seconds	type/API misuse, a11y lint rules	false sense of security on runtime behavior
Unit	milliseconds–seconds	pure logic, edge cases, parsers	over-mocking; testing implementation
Component / integration	seconds	UI behavior + local state + mocked network	jsdom gaps vs real browser
E2E	minutes	critical user journeys, auth, routing	flakiness, slow feedback, env drift
Visual regression	minutes + review	layout/CSS regressions	snapshot noise, baseline maintenance

Principal takeaway: Optimize the shape of your suite for your product’s risk profile—not for a diagram on a slide. A B2B dashboard with complex client-side calculations may need more unit tests; a checkout flow with three payment providers needs ruthless e2e and contract tests. {Kết luận principal: Tối ưu hình dạng suite theo risk profile sản phẩm—không theo slide. Dashboard B2B với tính toán phức tạp có thể cần nhiều unit hơn; checkout với ba payment provider cần e2e và contract test chặt.}

Unit tests: pure logic and the mocking trap

Unit tests excel where inputs and outputs are explicit and side effects are bounded. {Unit test giỏi ở chỗ input/output rõ ràng và side effect có giới hạn.}

Worth unit-testing:

Pure functions: formatters, validators, reducers, selectors, pricing engines {pure function: formatter, validator, reducer, selector, pricing engine}
Algorithms with edge cases: pagination math, retry backoff, diff utilities {thuật toán có edge case: pagination, retry backoff, diff}
Error mapping: HTTP status → user-facing copy, domain error codes {map lỗi: HTTP status → copy cho user, mã lỗi domain}

Usually not worth deep unit coverage:

Thin React wrappers that only compose hooks and render JSX {wrapper React mỏng chỉ gom hook và render JSX}
CSS layout (use visual or e2e instead) {layout CSS (dùng visual hoặc e2e)}
Code that is essentially “call the API and show the result”—that belongs in integration/e2e {code chỉ “gọi API rồi hiển thị”—thuộc integration/e2e}

Vitest at scale

Vitest shares Vite’s transform pipeline, so test startup stays fast in monorepos that already use Vite. {Vitest dùng chung pipeline transform của Vite nên khởi động test nhanh trong monorepo đã dùng Vite.}

// src/lib/format-currency.test.ts
import { describe, it, expect } from 'vitest';
import { formatCurrency } from './format-currency';

describe('formatCurrency', () => {
  it('formats zero without fractional noise', () => {
    expect(formatCurrency(0, 'USD')).toBe('$0.00');
  });

  it('rounds half-up for display', () => {
    expect(formatCurrency(10.005, 'USD')).toBe('$10.01');
  });
});

Use describe.concurrent and file-level parallelism carefully: shared global state (mutable module singletons, Date.now without faking) causes order-dependent failures. {Dùng describe.concurrent và song song theo file cẩn thận: global state dùng chung (singleton module, Date.now không fake) gây fail phụ thuộc thứ tự.}

import { vi, beforeEach, afterEach } from 'vitest';

beforeEach(() => {
  vi.useFakeTimers();
  vi.setSystemTime(new Date('2026-05-29T12:00:00Z'));
});

afterEach(() => {
  vi.useRealTimers();
});

Mocking pitfalls that erode trust

Mocks are a loan against future refactors—you pay interest when the real module changes but tests still pass. {Mock là khoản vay trước refactor—lãi trả khi module thật đổi mà test vẫn pass.}

Anti-pattern	Why it hurts
Mocking the module under test	You prove nothing about production behavior
Asserting call order on internal helpers	Breaks on harmless refactors
Snapshotting entire error objects with stack traces	Noise on every Node/V8 bump
`vi.mock` of deep dependency trees	Tests document the mock graph, not the app

Prefer dependency injection at boundaries (pass fetch, clock, storage) over blanket vi.mock. {Ưu tiên dependency injection ở biên (truyền fetch, clock, storage) thay vì vi.mock tràn lan.}

Rule of thumb: If deleting the implementation and replacing it with return 42 still lets the test pass, you are not testing behavior. {Nguyên tắc: Nếu xóa implementation, thay bằng return 42 mà test vẫn pass, bạn không test hành vi.}

Component tests: Testing Library and the user-centric model

Component tests sit between unit and integration: real render tree, real event dispatch (mostly), often mocked network. {Component test nằm giữa unit và integration: cây render thật, dispatch event thật (phần lớn), thường mock network.}

Testing Library’s philosophy is deliberate: query as the user would find things, interact as the user would, assert on outcomes—not on internal state or data-testid unless role/text truly cannot work. {Triết lý Testing Library: query như user tìm, tương tác như user, assert kết quả—không phải state nội bộ hay data-testid trừ khi role/text không dùng được.}

Priority order (simplified): getByRole → getByLabelText → getByPlaceholderText → getByText → last resort getByTestId. {Thứ tự ưu tiên (rút gọn): getByRole → getByLabelText → getByPlaceholderText → getByText → cuối cùng getByTestId.}

// CartSummary.test.tsx
import { render, screen } from '@testing-library/react';
import userEvent from '@testing-library/user-event';
import { CartSummary } from './CartSummary';

it('applies coupon and updates total', async () => {
  const user = userEvent.setup();
  render(<CartSummary subtotal={100} />);

  await user.type(screen.getByLabelText(/coupon code/i), 'SAVE10');
  await user.click(screen.getByRole('button', { name: /apply/i }));

  expect(screen.getByRole('status')).toHaveTextContent('$90.00');
});

Use @testing-library/user-event over fireEvent for realistic pointer/keyboard sequences—it reduces false passes where fireEvent.click bypasses disabled buttons or missing focus. {Dùng @testing-library/user-event thay fireEvent cho chuỗi pointer/keyboard thực tế—giảm false pass khi fireEvent.click bỏ qua disabled hoặc thiếu focus.}

jsdom limits vs real-browser component testing

jsdom is not a browser: no layout, incomplete CSS, no IntersectionObserver unless polyfilled, flaky fetch/cookie semantics. {jsdom không phải browser: không layout, CSS không đủ, không IntersectionObserver trừ khi polyfill, semantics fetch/cookie dễ lệch.}

That is acceptable for behavioral tests (forms, ARIA, conditional rendering). It is insufficient for layout-dependent behavior (sticky headers, scroll snapping, container queries). {Chấp nhận được cho test hành vi (form, ARIA, render có điều kiện). Không đủ cho hành vi phụ thuộc layout (sticky header, scroll snap, container query).}

Two escape hatches:

Vitest Browser Mode with @vitest/browser and Playwright provider—same test file, real Chromium, still colocated with source. {Vitest Browser Mode với @vitest/browser và Playwright provider—cùng file test, Chromium thật, vẫn cạnh source.}
Playwright component testing (@playwright/experimental-ct-react) when the component under test needs real layout or Web APIs. {Playwright component testing (@playwright/experimental-ct-react) khi component cần layout hoặc Web API thật.}

Trade-off: real-browser component tests are slower and harder to debug locally; reserve them for components where jsdom false confidence is expensive (maps, rich text, drag-and-drop). {Đổi lại: component test browser chậm hơn, debug local khó hơn; dành cho component mà false confidence jsdom tốn kém (map, rich text, drag-and-drop).}

Integration tests: features across boundaries

An integration test proves that multiple units cooperate correctly—often a route, a data hook, and UI together—with network controlled but not every leaf function mocked. {Integration test chứng minh nhiều unit phối hợp đúng—thường route, data hook, UI cùng lúc—network được kiểm soát nhưng không mock mọi hàm lá.}

Example scope: “User opens /settings/billing, sees current plan from API, upgrades, sees success toast and updated plan.” {Ví dụ phạm vi: “User mở /settings/billing, thấy plan từ API, upgrade, thấy toast thành công và plan cập nhật.”}

MSW (Mock Service Worker) for network

MSW intercepts requests at the network boundary—same code path as production fetch/XHR—unlike mocking axios.get inside the module. {MSW chặn request ở biên network—cùng code path với fetch/XHR production—khác mock axios.get trong module.}

// tests/msw/handlers/billing.ts
import { http, HttpResponse } from 'msw';

export const billingHandlers = [
  http.get('/api/billing/plan', () => {
    return HttpResponse.json({ plan: 'pro', seats: 5 });
  }),
  http.post('/api/billing/upgrade', async ({ request }) => {
    const body = await request.json();
    if (body.plan !== 'enterprise') {
      return HttpResponse.json({ error: 'invalid_plan' }, { status: 400 });
    }
    return HttpResponse.json({ plan: 'enterprise', seats: 5 });
  }),
];

// BillingPage.integration.test.tsx
import { setupServer } from 'msw/node';
import { billingHandlers } from '../msw/handlers/billing';

const server = setupServer(...billingHandlers);

beforeAll(() => server.listen({ onUnhandledRequest: 'error' }));
afterEach(() => server.resetHandlers());
afterAll(() => server.close());

onUnhandledRequest: 'error' is non-negotiable at scale: silent passthrough to real APIs creates flaky tests and accidental load on staging. {onUnhandledRequest: 'error' không thương lượng ở scale: passthrough im lặng tới API thật tạo flake và tải staging không chủ ý.}

For GraphQL, use MSW’s graphql helpers; for streaming/SSE, prefer e2e or dedicated contract tests—mocking streams often lies about backpressure and cancellation. {Với GraphQL dùng helper graphql của MSW; streaming/SSE ưu tiên e2e hoặc contract test—mock stream dễ nói sai về backpressure và cancel.}

End-to-end testing with Playwright

E2E tests are your production-shaped safety net: real browser, real router, real cookies, real timing. {E2E là lưới an toàn giống production: browser, router, cookie, timing thật.}

They are also the first place flakiness, slow CI, and “works on my machine” collide—so discipline matters more than syntax. {Cũng là nơi flake, CI chậm, và “máy tôi chạy được” va nhau—kỷ luật quan trọng hơn cú pháp.}

Auto-waiting, locators, and web-first assertions

Playwright automatically waits for actionability (visible, stable, enabled) before clicks and fills. {Playwright tự đợi element actionable (visible, stable, enabled) trước click và fill.}

Prefer user-facing locators:

await page.getByRole('link', { name: 'Billing' }).click();
await page.getByLabel('Email').fill('user@example.com');
await page.getByRole('button', { name: 'Sign in' }).click();

Avoid CSS/XPath chains tied to implementation (div.container > span:nth-child(3)). {Tránh chuỗi CSS/XPath gắn implementation (div.container > span:nth-child(3)).}

Use web-first assertions that retry until timeout:

await expect(page.getByRole('heading', { name: 'Dashboard' })).toBeVisible();
await expect(page.getByTestId('seat-count')).toHaveText('5');

Never wrap these in manual waitForTimeout—that hides race conditions and linearly increases suite time. {Không bọc bằng waitForTimeout thủ công—che race condition và tăng thời gian suite tuyến tính.}

Fixtures, parallelization, and trace viewer

Playwright fixtures compose setup (authenticated page, seeded DB handle, feature flags) without copy-paste beforeEach blocks. {Fixture Playwright gom setup ( page đã auth, handle DB seed, feature flag) không copy-paste beforeEach.}

// tests/fixtures/auth.ts
import { test as base } from '@playwright/test';

export const test = base.extend<{ authedPage: Page }>({
  authedPage: async ({ page }, use) => {
    await page.goto('/login');
    await page.getByLabel('Email').fill(process.env.E2E_USER!);
    await page.getByLabel('Password').fill(process.env.E2E_PASSWORD!);
    await page.getByRole('button', { name: 'Sign in' }).click();
    await expect(page).toHaveURL(/dashboard/);
    await use(page);
  },
});

Run workers in parallel (fullyParallel: true) but isolate data per test—parallel tests must not share mutable accounts unless you use transactional rollback or unique tenant prefixes. {Chạy worker song song (fullyParallel: true) nhưng cô lập data mỗi test—test song song không dùng chung account mutable trừ khi rollback transaction hoặc prefix tenant riêng.}

On failure, trace viewer (trace: 'on-first-retry' or 'retain-on-failure') captures DOM snapshots, network, and console—essential for debugging CI-only failures. {Khi fail, trace viewer (trace: 'on-first-retry' hoặc 'retain-on-failure') ghi DOM, network, console—cần cho debug fail chỉ trên CI.}

Avoiding flakiness: a checklist

Do	Don’t
Seed deterministic data via API or SQL fixtures	Rely on “whatever is in staging today”
Use `expect` with auto-retry	`page.waitForTimeout(3000)`
Stub third-party widgets you do not own	Load live Stripe/Maps in every test
One logical journey per test	Ten assertions after unrelated navigation
Quarantine flaky tests immediately	Retry forever without ownership

Network idle: waitForLoadState('networkidle') is deprecated as a default strategy—SPAs with websockets and analytics never go idle. Assert on user-visible outcomes instead. {Network idle: waitForLoadState('networkidle') không còn là chiến lược mặc định—SPA có websocket và analytics không bao giờ idle. Assert kết quả user thấy thay vì vậy.}

Visual regression testing

Visual tests catch what behavioral tests miss: misaligned grids, wrong font weight, dark-mode token regressions, overflow at specific breakpoints. {Visual test bắt những gì behavioral test bỏ sót: lệch grid, font weight sai, regress token dark mode, overflow ở breakpoint cụ thể.}

Three common approaches:

Jest/Vitest image snapshots (toMatchSnapshot on canvas/screenshot)—fast feedback, brutal noise on anti-aliasing and font rendering across OS. {Snapshot ảnh Jest/Vitest—feedback nhanh, nhiễu kinh khủng do anti-aliasing và font khác OS.}
Playwright toHaveScreenshot()—built-in diff, threshold tuning, stored per project/browser. {Playwright toHaveScreenshot()—diff sẵn, chỉnh threshold, lưu theo project/browser.}
Hosted services (Chromatic, Percy, Argos)—cloud rendering, review UI, baseline approval workflow; cost scales with snapshot count. {Dịch vụ hosted (Chromatic, Percy, Argos)—render cloud, UI review, workflow duyệt baseline; chi phí theo số snapshot.}

await expect(page).toHaveScreenshot('billing-page.png', {
  maxDiffPixelRatio: 0.01,
  mask: [page.getByTestId('live-clock')],
});

Maintenance cost is real: Every design token rename can fail hundreds of baselines. Mitigate by testing stable components and critical pages, masking dynamic regions (ads, timestamps, avatars from CDN), and running visual jobs on a single Linux CI image—not every developer laptop. {Chi phí bảo trì thật: Đổi tên design token có thể fail hàng trăm baseline. Giảm bằng test component/page ổn định, mask vùng động (quảng cáo, timestamp, avatar CDN), chạy visual trên một image Linux CI—không phải mọi laptop dev.}

Principal stance: visual tests are product/design contracts, not a substitute for unit or e2e coverage. {Quan điểm principal: visual test là hợp đồng product/design, không thay coverage unit hay e2e.}

Contract testing when services multiply

When the frontend talks to five microservices owned by four teams, e2e alone cannot tell you whether a deploy broke the API shape before users hit production. {Khi frontend nói với năm microservice của bốn team, e2e một mình không báo deploy làm hỏng shape API trước khi user vào production.}

Consumer-driven contract testing (Pact is the common tooling) records expectations from the frontend (consumer) and verifies providers against those contracts in their CI. {Consumer-driven contract testing (Pact là tooling phổ biến) ghi expectation từ frontend (consumer) và provider verify trong CI của họ.}

Brief flow:

Frontend test or pact test emits expected request/response schema. {Test frontend hoặc pact test phát schema request/response mong đợi.}
Pact broker stores the contract. {Pact broker lưu contract.}
Provider CI runs verification—fail if response drifts without consumer approval. {CI provider chạy verify—fail nếu response lệch mà consumer chưa duyệt.}

This does not replace e2e—it removes an entire class of “backend renamed userId to user_id” failures from your slowest layer. {Không thay e2e—loại bỏ cả lớp lỗi “backend đổi userId thành user_id” khỏi tầng chậm nhất.}

Skip contract testing if you have a single BFF with OpenAPI enforced in CI both sides; use the schema as the contract. {Bỏ qua contract test nếu có một BFF với OpenAPI enforce hai phía CI; dùng schema làm contract.}

Accessibility testing in CI

Automated a11y catches ~30–50% of issues (depending on who you ask)—but that slice includes missing labels, invalid ARIA, and color contrast failures that are cheap to prevent. {A11y tự động bắt ~30–50% issue—nhưng gồm thiếu label, ARIA sai, contrast—rẻ để ngăn.}

axe-core via @axe-core/playwright or jest-axe in component tests:

import AxeBuilder from '@axe-core/playwright';

test('dashboard has no critical a11y violations', async ({ page }) => {
  await page.goto('/dashboard');
  const results = await new AxeBuilder({ page })
    .withTags(['wcag2a', 'wcag2aa'])
    .analyze();
  expect(results.violations.filter(v => v.impact === 'critical')).toEqual([]);
});

Run axe on representative routes and component states (empty, loading, error)—not once on Storybook’s happy path only. {Chạy axe trên route và state component đại diện (empty, loading, error)—không chỉ happy path Storybook.}

Automated scans do not replace keyboard walkthroughs or screen reader validation for complex widgets (comboboxes, date pickers). {Scan tự động không thay walkthrough bàn phím hay screen reader cho widget phức tạp (combobox, date picker).}

Test data, factories, seeding, and environments

Flaky e2e is often a data problem, not a Playwright problem. {E2E flake thường là vấn đề data, không phải Playwright.}

Patterns that scale:

Factories (e.g. @faker-js/faker with fixed seed in CI) generate entities with sensible defaults; tests override only fields they assert on. {Factory (vd. @faker-js/faker với seed cố định trên CI) tạo entity default hợp lý; test chỉ override field cần assert.}
API seeding before browser steps: create user + subscription via request.newContext() faster than clicking through admin UI. {Seed qua API trước bước browser: tạo user + subscription qua request.newContext() nhanh hơn click admin UI.}
Idempotent cleanup or isolated tenants (e2e-${testInfo.workerIndex}-${Date.now()}) prevent cross-test pollution. {Cleanup idempotent hoặc tenant riêng (e2e-${testInfo.workerIndex}-${Date.now()}) tránh test làm bẩn nhau.}

import { test } from '@playwright/test';

test('invite flow', async ({ page, request }) => {
  const email = `invite-${Date.now()}@example.test`;
  await request.post('/api/test/seed-user', { data: { email, role: 'admin' } });
  await page.goto('/team/invite');
  // ...
});

Environment matrix:

Env	Purpose	Data
Local	fast feedback, MSW-heavy	synthetic
CI ephemeral	PR gates, parallel shards	seeded per run
Staging	nightly e2e, contract publish	shared, refreshed nightly
Production	synthetic monitoring only	read-only probes

Never point PR e2e at shared staging without isolation—merge queues will step on each other. {Không trỏ e2e PR vào staging dùng chung không cô lập—merge queue sẽ đạp lên nhau.}

CI strategy: fast PRs, thorough nights

The goal is tiered execution: cheap checks on every push, expensive checks on schedule or before release. {Mục tiêu thực thi phân tầng: check rẻ mỗi push, check đắt theo lịch hoặc trước release.}

What runs on PR

Lint, typecheck, unit + integration (Vitest), affected-project detection in monorepos {lint, typecheck, unit + integration (Vitest), detect project bị ảnh hưởng trong monorepo}
Smoke e2e: 5–15 critical paths, sharded across workers {smoke e2e: 5–15 path critical, shard qua worker}
Target: < 10 minutes p95 for developer flow {mục tiêu < 10 phút p95 cho flow dev}

Sharding and caching

Playwright supports --shard=1/4 to split spec files across machines. {Playwright hỗ trợ --shard=1/4 chia spec file qua nhiều máy.}

Cache:

npm/pnpm store, Playwright browser binaries {store npm/pnpm, binary browser Playwright}
Vite/Vitest transform cache when safe {cache transform Vite/Vitest khi an toàn}
Do not cache test results across commits without content-addressed keys tied to lockfile + source hash {Không cache kết quả test qua commit nếu không có key theo lockfile + hash source}

Nightly and pre-release

Full e2e matrix (browsers you actually support—not every WebKit version ever) {e2e đầy đủ (browser bạn thực sự support—không phải mọi bản WebKit)}
Visual baseline update jobs (with human approval) {job cập nhật visual baseline (có duyệt người)}
Performance budgets, bundle size, optional chaos on staging {budget performance, bundle size, chaos tùy chọn trên staging}

Quarantining flaky tests

A flaky test is a production incident waiting in queue. {Test flake là sự cố production đang xếp hàng.}

Process:

@flaky tag or dedicated quarantine project in CI {tag @flaky hoặc project quarantine riêng trên CI}
Ticket with owner and SLA to fix or delete within N days {ticket có owner và SLA sửa hoặc xóa trong N ngày}
Block new merges that increase quarantine count {chặn merge mới nếu tăng số quarantine}

Playwright retries (retries: 2 on CI) are a band-aid for infra blips—not for race conditions. Fix the race. {Retry Playwright (retries: 2 trên CI) là băng keo cho infra—không cho race condition. Sửa race.}

Coverage: signal, not goal

Istanbul/c8 coverage in Vitest is useful to find untested critical modules—not to gate at 90%. {Coverage Istanbul/c8 trong Vitest hữu ích tìm module critical chưa test—không phải gate 90%.}

Track coverage trends on payment, auth, and permission code; ignore coverage on generated files and static marketing pages. {Theo dõi xu hướng coverage trên payment, auth, permission; bỏ qua file generate và trang marketing tĩnh.}

A pragmatic strategy a principal would set

Here is a concrete policy template for a large frontend (50+ engineers, monorepo, daily deploys). {Dưới đây là template policy cụ thể cho frontend lớn (50+ engineer, monorepo, deploy hàng ngày).}

1. Define risk tiers

Tier	Examples	Required tests
P0	Login, checkout, permissions	e2e smoke on every PR + full nightly + contract
P1	Settings, integrations	integration + selective e2e
P2	Marketing, internal tools	unit + manual QA cadence

2. Colocation and ownership

Tests live next to source; owning team fixes failures within 24h of main breakage. {Test cạnh source; team sở hữu sửa fail trong 24h khi main gãy.}

No central “QA team” gate—platform provides harnesses (MSW helpers, auth fixtures, seed CLI). {Không gate bởi “team QA” tập trung—platform cung cấp harness (helper MSW, fixture auth, seed CLI).}

3. The default stack (example)

Vitest + Testing Library + MSW for unit/integration {Vitest + Testing Library + MSW cho unit/integration}
Playwright for e2e and trace-first debugging {Playwright cho e2e và debug ưu tiên trace}
axe in CI on P0/P1 routes {axe trên CI cho route P0/P1}
Pact (or OpenAPI diff) for service boundaries you do not own {Pact (hoặc diff OpenAPI) cho biên service không sở hữu}
Chromatic or Playwright screenshots for design-system primitives only—not every page {Chromatic hoặc screenshot Playwright chỉ primitive design-system—không phải mọi page}

4. Metrics that matter

PR check duration p95 {thời lượng check PR p95}
Main branch green rate {tỷ lệ main xanh}
Flake rate per 1,000 test runs {tỷ lệ flake trên 1.000 lần chạy test}
Mean time to fix broken main {thời gian trung bình sửa main gãy}
Escaped defects per release on P0 flows {defect lọt mỗi release trên flow P0}

Not: raw coverage percentage on the dashboard. {Không phải: phần trăm coverage thô trên dashboard.}

5. Cultural rules

Red CI blocks merge—no “retry until green” without investigation. {CI đỏ chặn merge—không “retry đến khi xanh” không điều tra.}
Deleting a flaky test requires deleting or replacing the behavior coverage. {Xóa test flake phải xóa hoặc thay coverage hành vi tương ứng.}
New features ship with tests at the lowest layer that yields honest confidence—escalate to e2e only when lower layers lie. {Tính năng mới ship kèm test ở tầng thấp nhất cho confidence thật—leo lên e2e chỉ khi tầng dưới nói dối.}

Closing: trust is the product

Your test suite is infrastructure users never see—but they feel every flake as delayed fixes and every missing test as a outage. {Test suite là hạ tầng user không thấy—nhưng họ cảm mỗi flake là fix trễ và mỗi test thiếu là outage.}

Optimize for trustworthy signal, tiered cost, and fast recovery when main breaks. The trophy, the pyramid, and the latest runner matter only insofar as they serve those three. {Tối ưu tín hiệu đáng tin, chi phí phân tầng, và phục hồi nhanh khi main gãy. Trophy, pyramid, và runner mới nhất chỉ quan trọng ở chỗ phục vụ ba điều đó.}

Start by measuring PR duration and flake rate this week; pick one P0 flow and trace it from unit → MSW integration → Playwright e2e; delete one test that mocks its way to green. That is strategy at scale—not another slide deck. {Bắt đầu đo thời lượng PR và flake rate tuần này; chọn một flow P0 và trace từ unit → integration MSW → e2e Playwright; xóa một test mock cho ra xanh giả. Đó mới là strategy ở scale—không phải slide deck thêm.}