From 706 Over-Tests to 52 Right-Sized Tests: Building a Scalable Test Infrastructure

13 min read
testingtypescriptwebdevdevops

From 706 Over-Tests to 52 Right-Sized Tests: Building a Scalable Test Infrastructure

Testing is supposed to give you confidence. But when I looked at our test suite, I didn't feel confident—I felt trapped. We had 706 end-to-end tests validating a 224-line Zod schema. That's a 5-to-1 test-to-code ratio. Every schema change meant updating hundreds of tests. Development velocity had ground to a halt.

Something had to change.

In this post, I'll walk through how we diagnosed our over-testing problem, established proper test boundaries, and built a scalable test infrastructure that actually accelerates development instead of slowing it down.

The Test Ratio Problem

Let's start with the numbers that made me realize we had a problem:

Before:

  • 706 E2E tests for frontmatter validation
  • 224 lines of schema code
  • Test-to-code ratio: 5-to-1
  • Test execution time: ~45 seconds
  • Maintenance burden: Every schema change required updating 50-100 tests

After:

  • 20 unit tests for schema validation
  • 32 E2E tests for integration scenarios
  • Test-to-code ratio: 1-to-1
  • Test execution time: less than 2 seconds (unit), ~8 seconds (E2E)
  • Maintenance: Schema changes update 1-3 tests

What Went Wrong?

The core issue was using the wrong test type for the wrong purpose. We were using Playwright (an end-to-end testing framework) to test Zod schema validation (a pure function that should use unit tests).

Here's what that looked like:

// Bad: E2E test for schema validation (from frontmatter-validation.spec.ts)
test('title must be a string', async ({ page }) => {
  await page.goto('/blog/test-post')
  // Navigate to admin, submit invalid frontmatter, check error message
  // 10+ lines of page navigation and DOM assertions
  // Just to test: typeof title === 'string'
})
 
// Repeated 706 times for every field, every validation rule
test('title cannot be empty', async ({ page }) => { /* ... */ })
test('title must be at least 1 character', async ({ page }) => { /* ... */ })
test('description must be a string', async ({ page }) => { /* ... */ })
// ... 702 more tests

The problem compounds:

  1. Each test requires starting the dev server (~3 seconds overhead)
  2. Each test requires browser automation (~0.5 seconds per test)
  3. Changes to unrelated code can break tests (brittle selectors)
  4. Slow tests mean fewer iterations, slower development

The Right Way: Unit vs E2E Boundaries

The solution was establishing clear boundaries between unit and E2E tests using a simple rule:

If you're calling a function directly → Unit test If you're using page.goto() → E2E test

Here's the same validation logic with proper unit tests:

// Good: Unit test for schema validation
import { describe, it, expect } from 'vitest'
import { FrontmatterSchema } from '@/lib/schemas/frontmatter'
 
describe('Frontmatter Schema', () => {
  it('validates required fields', () => {
    const invalid = { title: 'Test' } // Missing description, date, tags
    expect(() => FrontmatterSchema.parse(invalid)).toThrow()
  })
 
  it('enforces description length limit', () => {
    const tooLong = {
      title: 'Test',
      description: 'a'.repeat(161), // Max 160 chars for SEO
      date: '2025-11-11',
      tags: ['testing']
    }
    expect(() => FrontmatterSchema.parse(tooLong)).toThrow(/160 characters/)
  })
 
  it('enforces tag taxonomy', () => {
    const invalidTag = {
      title: 'Test',
      description: 'Valid description',
      date: '2025-11-11',
      tags: ['invalid-tag'] // Not in ALLOWED_TAGS
    }
    expect(() => FrontmatterSchema.parse(invalidTag)).toThrow()
  })
})

This covers the same validation logic with 3 tests instead of 706, and runs in milliseconds instead of seconds.

The Solution: Vitest + Playwright

We set up a dual-testing infrastructure using the right tool for each job:

Vitest for Unit Tests

Use case: Pure functions, business logic, schema validation, utility functions

Configuration (vitest.config.ts):

import { defineConfig } from 'vitest/config'
import react from '@vitejs/plugin-react'
import path from 'path'
 
export default defineConfig({
  plugins: [react()],
  test: {
    environment: 'jsdom',
    globals: true,
    setupFiles: ['./tests/setup.ts'],
  },
  resolve: {
    alias: {
      '@': path.resolve(__dirname, './src'),
    },
  },
})

Benefits:

  • Fast: 20 tests run in less than 2 seconds
  • Isolated: Each test runs independently, no shared state
  • Focused: Test one thing at a time
  • Maintainable: Direct function calls, no DOM navigation

Playwright for E2E Tests

Use case: User workflows, integration points, cross-browser behavior, progressive enhancement

Configuration (playwright.config.ts):

import { defineConfig, devices } from '@playwright/test'
 
export default defineConfig({
  testDir: './tests/e2e',
  fullyParallel: true,
  retries: process.env.CI ? 2 : 0,
  use: {
    baseURL: 'http://localhost:3000',
    trace: 'on-first-retry',
    screenshot: 'only-on-failure',
  },
  projects: [
    { name: 'chromium', use: { ...devices['Desktop Chrome'] } },
    { name: 'firefox', use: { ...devices['Desktop Firefox'] } },
    { name: 'webkit', use: { ...devices['Desktop Safari'] } },
    { name: 'mobile', use: { ...devices['iPhone 13'] } },
  ],
  webServer: {
    command: 'npm run dev',
    url: 'http://localhost:3000',
    reuseExistingServer: !process.env.CI,
  },
})

Benefits:

  • Realistic: Tests actual user interactions in real browsers
  • Cross-browser: Catches Safari/Firefox-specific issues
  • Integration: Validates entire stack (server + client + rendering)
  • Visual: Screenshots + videos on failure

Test Patterns We Use

Pattern 1: Schema Validation (Unit Test)

When testing Zod schemas, focus on business rules, not Zod internals:

// Good: Test business logic
describe('Frontmatter Schema', () => {
  it('requires minimum fields for valid post', () => {
    const validPost = {
      title: 'My Post',
      description: 'A great post about testing',
      date: '2025-11-11',
      tags: ['testing', 'typescript']
    }
 
    const result = FrontmatterSchema.parse(validPost)
    expect(result).toEqual(validPost)
  })
 
  it('applies smart defaults', () => {
    const minimal = {
      title: 'Test',
      description: 'Test post',
      date: '2025-11-11',
      tags: ['testing']
    }
 
    const result = FrontmatterSchema.parse(minimal)
    expect(result.draft).toBe(false) // Default value
    expect(result.featured).toBe(false) // Default value
    expect(result.toc).toBe(true) // Default value
  })
})

Don't test Zod's implementation:

// Bad: Testing library internals
it('title field exists', () => {
  expect(FrontmatterSchema.shape.title).toBeDefined() // Don't do this
})
 
it('title is a ZodString', () => {
  expect(FrontmatterSchema.shape.title).toBeInstanceOf(z.ZodString) // Don't do this
})

Pattern 2: User Navigation (E2E Test)

Test user-visible behavior across the full stack:

import { test, expect } from '@playwright/test'
 
test.describe('Blog Post Navigation', () => {
  test('should navigate from homepage to blog post', async ({ page }) => {
    // Start at homepage
    await page.goto('/')
 
    // Find first blog post link
    const firstPost = page.locator('article a').first()
    const postTitle = await firstPost.textContent()
 
    // Click and navigate
    await firstPost.click()
 
    // Verify navigation
    await expect(page).toHaveURL(/\/blog\/[\w-]+/)
    await expect(page.locator('h1')).toContainText(postTitle!)
  })
 
  test('should display post metadata', async ({ page }) => {
    await page.goto('/blog/building-scalable-test-infrastructure')
 
    // Check metadata visible
    await expect(page.locator('time')).toBeVisible()
    await expect(page.locator('[data-testid="reading-time"]')).toContainText(/\d+ min/)
    await expect(page.locator('[data-testid="tags"]')).toBeVisible()
  })
})

Pattern 3: Progressive Enhancement (E2E Test)

Validate that features work without JavaScript:

test.describe('Progressive Enhancement', () => {
  test('blog post readable without JavaScript', async ({ page, context }) => {
    // Disable JavaScript
    await context.setJavaScriptEnabled(false)
 
    // Navigate to post
    await page.goto('/blog/my-post')
 
    // Content should still be visible
    await expect(page.locator('h1')).toBeVisible()
    await expect(page.locator('article')).toContainText('My Post')
 
    // Code blocks should render
    await expect(page.locator('pre code')).toBeVisible()
  })
 
  test('sidenote links work without JavaScript', async ({ page, context }) => {
    await context.setJavaScriptEnabled(false)
    await page.goto('/blog/post-with-sidenotes')
 
    // Footnote link should navigate to footnote section
    const footnoteLink = page.locator('a[href^="#fn-"]').first()
    await footnoteLink.click()
 
    // Should jump to footnote
    await expect(page).toHaveURL(/#fn-\d+/)
  })
})

Pattern 4: Bundle Size Enforcement (E2E Test)

Ensure performance budgets are met:

test.describe('Bundle Size Budget', () => {
  test('baseline post ships 0 KB JavaScript', async ({ page }) => {
    // Post without islands should have zero client JS
    await page.goto('/blog/simple-post')
 
    // Measure JavaScript transferred
    const jsSize = await page.evaluate(() => {
      const resources = performance.getEntriesByType('resource') as PerformanceResourceTiming[]
      return resources
        .filter(r => r.name.endsWith('.js'))
        .reduce((acc, r) => acc + (r.transferSize || 0), 0)
    })
 
    expect(jsSize).toBe(0) // 0 KB baseline
  })
 
  test('post with progress island stays under budget', async ({ page }) => {
    await page.goto('/blog/post-with-progress')
 
    const jsSize = await page.evaluate(() => {
      const resources = performance.getEntriesByType('resource') as PerformanceResourceTiming[]
      return resources
        .filter(r => r.name.endsWith('.js'))
        .reduce((acc, r) => acc + (r.transferSize || 0), 0)
    })
 
    // Progress bar island ~2 KB (budget: 3 KB)
    expect(jsSize).toBeLessThan(3 * 1024)
  })
})

Pattern 5: TDD for Islands (Write Test First)

When implementing new client islands, write the test first:

// Step 1: Write failing test (use .skip initially)
test.skip('copy button adds to clipboard', async ({ page }) => {
  await page.goto('/blog/my-post')
 
  // Find first code block copy button
  const copyButton = page.locator('pre button[aria-label="Copy code"]').first()
  await copyButton.click()
 
  // Verify clipboard content
  const clipboardText = await page.evaluate(() => navigator.clipboard.readText())
  expect(clipboardText).toContain('function example()')
})
 
// Step 2: Implement the island component
// Step 3: Remove .skip and verify test passes

This TDD approach ensures:

  • Feature requirements are clear before implementation
  • No over-engineering (only build what the test requires)
  • Immediate feedback when implementation is complete

Results: Before vs After

Quantitative Improvements

MetricBefore (706 E2E)After (20 Unit + 32 E2E)Improvement
Total Tests70652-92.6%
Test-to-Code Ratio5-to-11-to-15x better
Execution Time~45s~10s4.5x faster
Lines of Test Code~18,000~1,300-92.8%
Maintenance Time~2 hours/change~5 min/change24x faster

Qualitative Improvements

Developer Experience:

  • ✅ Tests run in seconds, not minutes
  • ✅ Clear test failures point directly to issue
  • ✅ Refactoring no longer breaks 100 unrelated tests
  • ✅ Can iterate rapidly on new features

Confidence:

  • ✅ Unit tests catch logic errors immediately
  • ✅ E2E tests validate integration across browsers
  • ✅ Tests actually document expected behavior
  • ✅ No more "passing tests, broken feature" scenarios

Maintainability:

  • ✅ Adding new schema fields requires 1-2 new tests
  • ✅ Refactoring doesn't require test rewrites
  • ✅ Tests are self-documenting
  • ✅ New developers understand test structure immediately

Real-World Impact: Migration M3L-39

When we migrated from the 706-test approach to 52 right-sized tests (Linear ticket M3L-39), we saw immediate benefits:

Week 1 (706 E2E tests):

  • Add new frontmatter field → Update 47 tests
  • Time spent: 2.3 hours
  • Test execution: 45 seconds per run
  • Total iterations: 6 (testing changes)
  • Total time: ~3 hours

Week 2 (20 unit + 32 E2E tests):

  • Add new frontmatter field → Update 2 tests
  • Time spent: 8 minutes
  • Test execution: 10 seconds per run
  • Total iterations: 6 (testing changes)
  • Total time: ~15 minutes

That's a 12x improvement in productivity for a common task.

Best Practices: Choosing the Right Test Type

Decision Matrix

Use this simple flowchart when writing tests:

Are you calling a function directly?
├─ YES → Unit Test (Vitest)
└─ NO → Are you testing user interaction?
    ├─ YES → E2E Test (Playwright)
    └─ NO → Do you need it in a real browser?
        ├─ YES → E2E Test (Playwright)
        └─ NO → Unit Test (Vitest)

When to Use Unit Tests

Good use cases:

  • Schema validation (Zod, TypeScript types)
  • Utility functions (date formatting, string manipulation)
  • Business logic (reading time calculation, sorting)
  • Pure components (no side effects)

Bad use cases:

  • Testing browser APIs (localStorage, fetch)
  • Testing navigation flows
  • Testing CSS/layout
  • Testing third-party integrations

When to Use E2E Tests

Good use cases:

  • User workflows (homepage → blog post → navigation)
  • Cross-browser behavior (Safari quirks)
  • Progressive enhancement (works without JavaScript?)
  • Integration (MDX + frontmatter + styling all working together)

Bad use cases:

  • Testing Zod schema fields (use unit tests)
  • Testing pure functions (use unit tests)
  • Testing framework internals (trust the library)

Maintaining Test Ratios

Target a 1-to-1 to 2:1 test-to-code ratio:

// Good: 100 lines of code → 100-200 lines of test code
src/lib/schemas/frontmatter.ts (224 lines)
tests/unit/schemas/frontmatter.test.ts (180 lines)
Ratio: ~1-to-1 ✅

// Bad: 100 lines of code → 500 lines of test code
src/lib/schemas/frontmatter.ts (224 lines)
tests/e2e/frontmatter-validation.spec.ts (18,000 lines)
Ratio: 5-to-1 ❌

If your ratio exceeds 3:1, you're probably over-testing.

Making Tests Fast

Unit tests should run in milliseconds:

npm run test:unit
# PASS  tests/unit/schemas/frontmatter.test.ts
#   ✓ Frontmatter Schema (20 tests) 142ms

E2E tests should run in seconds:

npm run test:e2e
# PASS  tests/e2e/navigation.spec.ts
#   ✓ Blog Post Navigation (5 tests) 3.2s

Tips for speed:

  • Use test.describe.parallel() for independent tests
  • Reuse browser contexts when possible
  • Mock external APIs in unit tests
  • Use webServer.reuseExistingServer in CI

Lessons Learned

1. Test Fatigue is Real

When you have 706 tests, you stop trusting them. You skip them locally ("they take too long"). You get desensitized to failures ("probably a flaky test"). You dread making changes ("which 50 tests will break this time?").

The fix: Fewer, better tests that run fast and fail meaningfully.

2. Test the Interface, Not the Implementation

Our 706 tests were testing Zod's implementation, not our business logic. When we switched to testing what the schema validates instead of how Zod validates it, we cut tests by 95% with zero loss in confidence.

3. Test Types Have Different Purposes

Unit tests give you fast feedback on logic. E2E tests give you confidence in integration.

Don't use E2E tests for unit-test purposes. You'll pay the cost in speed and maintenance.

4. TDD Works for the Right Problems

Test-driven development shines when:

  • Requirements are clear (island specifications)
  • Feedback cycle matters (rapid iteration)
  • Regression risk is high (public API changes)

We use TDD for client islands, skip it for static content. Use the right tool for the job.

5. Tests Should Accelerate Development

If tests slow you down more than they speed you up, something is wrong. Our 706-test suite was a net negative: it prevented bugs, but at a massive productivity cost.

The 52-test suite is a net positive: it prevents bugs and accelerates development.

Conclusion

Fixing our over-testing problem transformed how we develop. We went from dreading test updates to relying on tests for fast feedback. The key insights:

Test Ratios Matter: Keep test-to-code ratios between 1-to-1 and 2:1. If you exceed 3:1, audit your tests.

Use the Right Tool: Unit tests (Vitest) for pure functions, E2E tests (Playwright) for integration. Don't use Playwright to test Zod schemas.

Speed is a Feature: Fast tests get run. Slow tests get skipped. Optimize for sub-second unit tests and sub-10-second E2E tests.

Focus on Value: Test user-visible behavior, not implementation details. Test what your code does, not how it does it.

Embrace TDD Selectively: Write tests first for islands and public APIs. Skip TDD for static content and one-off components.

Try It Yourself

Next time you're tempted to write an E2E test, ask:

  • Am I calling a function directly? (→ Unit test)
  • Am I using page.goto()? (→ E2E test)

Next time your test suite feels slow, audit your test ratios:

# Count lines of test code
find tests -name "*.test.ts" | xargs wc -l
# Count lines of source code
find src -name "*.ts" -not -name "*.test.ts" | xargs wc -l
# Calculate ratio

If your ratio exceeds 3:1, you probably have an over-testing problem.

Further Reading


Test ratios calculated from actual codebase metrics. See M3L-39 for migration details.