From 706 Over-Tests to 52 Right-Sized Tests: Building a Scalable Test Infrastructure
From 706 Over-Tests to 52 Right-Sized Tests: Building a Scalable Test Infrastructure
Testing is supposed to give you confidence. But when I looked at our test suite, I didn't feel confident—I felt trapped. We had 706 end-to-end tests validating a 224-line Zod schema. That's a 5-to-1 test-to-code ratio. Every schema change meant updating hundreds of tests. Development velocity had ground to a halt.
Something had to change.
In this post, I'll walk through how we diagnosed our over-testing problem, established proper test boundaries, and built a scalable test infrastructure that actually accelerates development instead of slowing it down.
The Test Ratio Problem
Let's start with the numbers that made me realize we had a problem:
Before:
- 706 E2E tests for frontmatter validation
- 224 lines of schema code
- Test-to-code ratio: 5-to-1
- Test execution time: ~45 seconds
- Maintenance burden: Every schema change required updating 50-100 tests
After:
- 20 unit tests for schema validation
- 32 E2E tests for integration scenarios
- Test-to-code ratio: 1-to-1
- Test execution time: less than 2 seconds (unit), ~8 seconds (E2E)
- Maintenance: Schema changes update 1-3 tests
What Went Wrong?
The core issue was using the wrong test type for the wrong purpose. We were using Playwright (an end-to-end testing framework) to test Zod schema validation (a pure function that should use unit tests).
Here's what that looked like:
// Bad: E2E test for schema validation (from frontmatter-validation.spec.ts)
test('title must be a string', async ({ page }) => {
await page.goto('/blog/test-post')
// Navigate to admin, submit invalid frontmatter, check error message
// 10+ lines of page navigation and DOM assertions
// Just to test: typeof title === 'string'
})
// Repeated 706 times for every field, every validation rule
test('title cannot be empty', async ({ page }) => { /* ... */ })
test('title must be at least 1 character', async ({ page }) => { /* ... */ })
test('description must be a string', async ({ page }) => { /* ... */ })
// ... 702 more testsThe problem compounds:
- Each test requires starting the dev server (~3 seconds overhead)
- Each test requires browser automation (~0.5 seconds per test)
- Changes to unrelated code can break tests (brittle selectors)
- Slow tests mean fewer iterations, slower development
The Right Way: Unit vs E2E Boundaries
The solution was establishing clear boundaries between unit and E2E tests using a simple rule:
If you're calling a function directly → Unit test
If you're using page.goto() → E2E test
Here's the same validation logic with proper unit tests:
// Good: Unit test for schema validation
import { describe, it, expect } from 'vitest'
import { FrontmatterSchema } from '@/lib/schemas/frontmatter'
describe('Frontmatter Schema', () => {
it('validates required fields', () => {
const invalid = { title: 'Test' } // Missing description, date, tags
expect(() => FrontmatterSchema.parse(invalid)).toThrow()
})
it('enforces description length limit', () => {
const tooLong = {
title: 'Test',
description: 'a'.repeat(161), // Max 160 chars for SEO
date: '2025-11-11',
tags: ['testing']
}
expect(() => FrontmatterSchema.parse(tooLong)).toThrow(/160 characters/)
})
it('enforces tag taxonomy', () => {
const invalidTag = {
title: 'Test',
description: 'Valid description',
date: '2025-11-11',
tags: ['invalid-tag'] // Not in ALLOWED_TAGS
}
expect(() => FrontmatterSchema.parse(invalidTag)).toThrow()
})
})This covers the same validation logic with 3 tests instead of 706, and runs in milliseconds instead of seconds.
The Solution: Vitest + Playwright
We set up a dual-testing infrastructure using the right tool for each job:
Vitest for Unit Tests
Use case: Pure functions, business logic, schema validation, utility functions
Configuration (vitest.config.ts):
import { defineConfig } from 'vitest/config'
import react from '@vitejs/plugin-react'
import path from 'path'
export default defineConfig({
plugins: [react()],
test: {
environment: 'jsdom',
globals: true,
setupFiles: ['./tests/setup.ts'],
},
resolve: {
alias: {
'@': path.resolve(__dirname, './src'),
},
},
})Benefits:
- Fast: 20 tests run in less than 2 seconds
- Isolated: Each test runs independently, no shared state
- Focused: Test one thing at a time
- Maintainable: Direct function calls, no DOM navigation
Playwright for E2E Tests
Use case: User workflows, integration points, cross-browser behavior, progressive enhancement
Configuration (playwright.config.ts):
import { defineConfig, devices } from '@playwright/test'
export default defineConfig({
testDir: './tests/e2e',
fullyParallel: true,
retries: process.env.CI ? 2 : 0,
use: {
baseURL: 'http://localhost:3000',
trace: 'on-first-retry',
screenshot: 'only-on-failure',
},
projects: [
{ name: 'chromium', use: { ...devices['Desktop Chrome'] } },
{ name: 'firefox', use: { ...devices['Desktop Firefox'] } },
{ name: 'webkit', use: { ...devices['Desktop Safari'] } },
{ name: 'mobile', use: { ...devices['iPhone 13'] } },
],
webServer: {
command: 'npm run dev',
url: 'http://localhost:3000',
reuseExistingServer: !process.env.CI,
},
})Benefits:
- Realistic: Tests actual user interactions in real browsers
- Cross-browser: Catches Safari/Firefox-specific issues
- Integration: Validates entire stack (server + client + rendering)
- Visual: Screenshots + videos on failure
Test Patterns We Use
Pattern 1: Schema Validation (Unit Test)
When testing Zod schemas, focus on business rules, not Zod internals:
// Good: Test business logic
describe('Frontmatter Schema', () => {
it('requires minimum fields for valid post', () => {
const validPost = {
title: 'My Post',
description: 'A great post about testing',
date: '2025-11-11',
tags: ['testing', 'typescript']
}
const result = FrontmatterSchema.parse(validPost)
expect(result).toEqual(validPost)
})
it('applies smart defaults', () => {
const minimal = {
title: 'Test',
description: 'Test post',
date: '2025-11-11',
tags: ['testing']
}
const result = FrontmatterSchema.parse(minimal)
expect(result.draft).toBe(false) // Default value
expect(result.featured).toBe(false) // Default value
expect(result.toc).toBe(true) // Default value
})
})Don't test Zod's implementation:
// Bad: Testing library internals
it('title field exists', () => {
expect(FrontmatterSchema.shape.title).toBeDefined() // Don't do this
})
it('title is a ZodString', () => {
expect(FrontmatterSchema.shape.title).toBeInstanceOf(z.ZodString) // Don't do this
})Pattern 2: User Navigation (E2E Test)
Test user-visible behavior across the full stack:
import { test, expect } from '@playwright/test'
test.describe('Blog Post Navigation', () => {
test('should navigate from homepage to blog post', async ({ page }) => {
// Start at homepage
await page.goto('/')
// Find first blog post link
const firstPost = page.locator('article a').first()
const postTitle = await firstPost.textContent()
// Click and navigate
await firstPost.click()
// Verify navigation
await expect(page).toHaveURL(/\/blog\/[\w-]+/)
await expect(page.locator('h1')).toContainText(postTitle!)
})
test('should display post metadata', async ({ page }) => {
await page.goto('/blog/building-scalable-test-infrastructure')
// Check metadata visible
await expect(page.locator('time')).toBeVisible()
await expect(page.locator('[data-testid="reading-time"]')).toContainText(/\d+ min/)
await expect(page.locator('[data-testid="tags"]')).toBeVisible()
})
})Pattern 3: Progressive Enhancement (E2E Test)
Validate that features work without JavaScript:
test.describe('Progressive Enhancement', () => {
test('blog post readable without JavaScript', async ({ page, context }) => {
// Disable JavaScript
await context.setJavaScriptEnabled(false)
// Navigate to post
await page.goto('/blog/my-post')
// Content should still be visible
await expect(page.locator('h1')).toBeVisible()
await expect(page.locator('article')).toContainText('My Post')
// Code blocks should render
await expect(page.locator('pre code')).toBeVisible()
})
test('sidenote links work without JavaScript', async ({ page, context }) => {
await context.setJavaScriptEnabled(false)
await page.goto('/blog/post-with-sidenotes')
// Footnote link should navigate to footnote section
const footnoteLink = page.locator('a[href^="#fn-"]').first()
await footnoteLink.click()
// Should jump to footnote
await expect(page).toHaveURL(/#fn-\d+/)
})
})Pattern 4: Bundle Size Enforcement (E2E Test)
Ensure performance budgets are met:
test.describe('Bundle Size Budget', () => {
test('baseline post ships 0 KB JavaScript', async ({ page }) => {
// Post without islands should have zero client JS
await page.goto('/blog/simple-post')
// Measure JavaScript transferred
const jsSize = await page.evaluate(() => {
const resources = performance.getEntriesByType('resource') as PerformanceResourceTiming[]
return resources
.filter(r => r.name.endsWith('.js'))
.reduce((acc, r) => acc + (r.transferSize || 0), 0)
})
expect(jsSize).toBe(0) // 0 KB baseline
})
test('post with progress island stays under budget', async ({ page }) => {
await page.goto('/blog/post-with-progress')
const jsSize = await page.evaluate(() => {
const resources = performance.getEntriesByType('resource') as PerformanceResourceTiming[]
return resources
.filter(r => r.name.endsWith('.js'))
.reduce((acc, r) => acc + (r.transferSize || 0), 0)
})
// Progress bar island ~2 KB (budget: 3 KB)
expect(jsSize).toBeLessThan(3 * 1024)
})
})Pattern 5: TDD for Islands (Write Test First)
When implementing new client islands, write the test first:
// Step 1: Write failing test (use .skip initially)
test.skip('copy button adds to clipboard', async ({ page }) => {
await page.goto('/blog/my-post')
// Find first code block copy button
const copyButton = page.locator('pre button[aria-label="Copy code"]').first()
await copyButton.click()
// Verify clipboard content
const clipboardText = await page.evaluate(() => navigator.clipboard.readText())
expect(clipboardText).toContain('function example()')
})
// Step 2: Implement the island component
// Step 3: Remove .skip and verify test passesThis TDD approach ensures:
- Feature requirements are clear before implementation
- No over-engineering (only build what the test requires)
- Immediate feedback when implementation is complete
Results: Before vs After
Quantitative Improvements
| Metric | Before (706 E2E) | After (20 Unit + 32 E2E) | Improvement |
|---|---|---|---|
| Total Tests | 706 | 52 | -92.6% |
| Test-to-Code Ratio | 5-to-1 | 1-to-1 | 5x better |
| Execution Time | ~45s | ~10s | 4.5x faster |
| Lines of Test Code | ~18,000 | ~1,300 | -92.8% |
| Maintenance Time | ~2 hours/change | ~5 min/change | 24x faster |
Qualitative Improvements
Developer Experience:
- ✅ Tests run in seconds, not minutes
- ✅ Clear test failures point directly to issue
- ✅ Refactoring no longer breaks 100 unrelated tests
- ✅ Can iterate rapidly on new features
Confidence:
- ✅ Unit tests catch logic errors immediately
- ✅ E2E tests validate integration across browsers
- ✅ Tests actually document expected behavior
- ✅ No more "passing tests, broken feature" scenarios
Maintainability:
- ✅ Adding new schema fields requires 1-2 new tests
- ✅ Refactoring doesn't require test rewrites
- ✅ Tests are self-documenting
- ✅ New developers understand test structure immediately
Real-World Impact: Migration M3L-39
When we migrated from the 706-test approach to 52 right-sized tests (Linear ticket M3L-39), we saw immediate benefits:
Week 1 (706 E2E tests):
- Add new frontmatter field → Update 47 tests
- Time spent: 2.3 hours
- Test execution: 45 seconds per run
- Total iterations: 6 (testing changes)
- Total time: ~3 hours
Week 2 (20 unit + 32 E2E tests):
- Add new frontmatter field → Update 2 tests
- Time spent: 8 minutes
- Test execution: 10 seconds per run
- Total iterations: 6 (testing changes)
- Total time: ~15 minutes
That's a 12x improvement in productivity for a common task.
Best Practices: Choosing the Right Test Type
Decision Matrix
Use this simple flowchart when writing tests:
Are you calling a function directly?
├─ YES → Unit Test (Vitest)
└─ NO → Are you testing user interaction?
├─ YES → E2E Test (Playwright)
└─ NO → Do you need it in a real browser?
├─ YES → E2E Test (Playwright)
└─ NO → Unit Test (Vitest)
When to Use Unit Tests
✅ Good use cases:
- Schema validation (Zod, TypeScript types)
- Utility functions (date formatting, string manipulation)
- Business logic (reading time calculation, sorting)
- Pure components (no side effects)
❌ Bad use cases:
- Testing browser APIs (localStorage, fetch)
- Testing navigation flows
- Testing CSS/layout
- Testing third-party integrations
When to Use E2E Tests
✅ Good use cases:
- User workflows (homepage → blog post → navigation)
- Cross-browser behavior (Safari quirks)
- Progressive enhancement (works without JavaScript?)
- Integration (MDX + frontmatter + styling all working together)
❌ Bad use cases:
- Testing Zod schema fields (use unit tests)
- Testing pure functions (use unit tests)
- Testing framework internals (trust the library)
Maintaining Test Ratios
Target a 1-to-1 to 2:1 test-to-code ratio:
// Good: 100 lines of code → 100-200 lines of test code
src/lib/schemas/frontmatter.ts (224 lines)
tests/unit/schemas/frontmatter.test.ts (180 lines)
Ratio: ~1-to-1 ✅
// Bad: 100 lines of code → 500 lines of test code
src/lib/schemas/frontmatter.ts (224 lines)
tests/e2e/frontmatter-validation.spec.ts (18,000 lines)
Ratio: 5-to-1 ❌
If your ratio exceeds 3:1, you're probably over-testing.
Making Tests Fast
Unit tests should run in milliseconds:
npm run test:unit
# PASS tests/unit/schemas/frontmatter.test.ts
# ✓ Frontmatter Schema (20 tests) 142msE2E tests should run in seconds:
npm run test:e2e
# PASS tests/e2e/navigation.spec.ts
# ✓ Blog Post Navigation (5 tests) 3.2sTips for speed:
- Use
test.describe.parallel()for independent tests - Reuse browser contexts when possible
- Mock external APIs in unit tests
- Use
webServer.reuseExistingServerin CI
Lessons Learned
1. Test Fatigue is Real
When you have 706 tests, you stop trusting them. You skip them locally ("they take too long"). You get desensitized to failures ("probably a flaky test"). You dread making changes ("which 50 tests will break this time?").
The fix: Fewer, better tests that run fast and fail meaningfully.
2. Test the Interface, Not the Implementation
Our 706 tests were testing Zod's implementation, not our business logic. When we switched to testing what the schema validates instead of how Zod validates it, we cut tests by 95% with zero loss in confidence.
3. Test Types Have Different Purposes
Unit tests give you fast feedback on logic. E2E tests give you confidence in integration.
Don't use E2E tests for unit-test purposes. You'll pay the cost in speed and maintenance.
4. TDD Works for the Right Problems
Test-driven development shines when:
- Requirements are clear (island specifications)
- Feedback cycle matters (rapid iteration)
- Regression risk is high (public API changes)
We use TDD for client islands, skip it for static content. Use the right tool for the job.
5. Tests Should Accelerate Development
If tests slow you down more than they speed you up, something is wrong. Our 706-test suite was a net negative: it prevented bugs, but at a massive productivity cost.
The 52-test suite is a net positive: it prevents bugs and accelerates development.
Conclusion
Fixing our over-testing problem transformed how we develop. We went from dreading test updates to relying on tests for fast feedback. The key insights:
Test Ratios Matter: Keep test-to-code ratios between 1-to-1 and 2:1. If you exceed 3:1, audit your tests.
Use the Right Tool: Unit tests (Vitest) for pure functions, E2E tests (Playwright) for integration. Don't use Playwright to test Zod schemas.
Speed is a Feature: Fast tests get run. Slow tests get skipped. Optimize for sub-second unit tests and sub-10-second E2E tests.
Focus on Value: Test user-visible behavior, not implementation details. Test what your code does, not how it does it.
Embrace TDD Selectively: Write tests first for islands and public APIs. Skip TDD for static content and one-off components.
Try It Yourself
Next time you're tempted to write an E2E test, ask:
- Am I calling a function directly? (→ Unit test)
- Am I using
page.goto()? (→ E2E test)
Next time your test suite feels slow, audit your test ratios:
# Count lines of test code
find tests -name "*.test.ts" | xargs wc -l
# Count lines of source code
find src -name "*.ts" -not -name "*.test.ts" | xargs wc -l
# Calculate ratioIf your ratio exceeds 3:1, you probably have an over-testing problem.
Further Reading
- Testing Library Guiding Principles - "Test what users see"
- Kent C. Dodds: Write tests. Not too many. Mostly integration. - Balance unit vs integration
- Vitest Documentation - Fast unit testing
- Playwright Best Practices - Reliable E2E tests
Test ratios calculated from actual codebase metrics. See M3L-39 for migration details.