Parallel Run Pattern

The Parallel Run pattern is a risk mitigation technique that allows you to validate new implementations against production traffic before fully committing to them. By running old and new code side-by-side, comparing their outputs, and only using the old (trusted) results to serve users, you can find bugs and edge cases with zero user impact.

This pattern is particularly valuable when replacing critical business logic, migrating algorithms, or modernizing systems where correctness is paramount.

Video Summary

The video presentation explores the Parallel Run pattern in depth:

When and why to use parallel runs versus other migration strategies
Architecture patterns for running code in parallel
Strategies for comparing outputs and detecting differences
Handling performance implications of running two systems
Monitoring and alerting on discrepancies
Real-world examples from companies using this at scale
Transitioning from parallel run to full cutover

Key Concepts

1. Shadow Traffic: Run New Code Without User Impact

The fundamental principle: new code receives the same inputs as old code, but its outputs are ignored (for now).

async function processOrder(order: Order): Promise<OrderResult> {
  // Primary path: old, trusted implementation
  const primaryResult = await legacyOrderProcessor.process(order);

  // Shadow path: new implementation (runs but doesn't affect users)
  executeInBackground(async () => {
    try {
      const shadowResult = await newOrderProcessor.process(order);

      // Compare results
      if (!resultsMatch(primaryResult, shadowResult)) {
        await logMismatch({
          orderId: order.id,
          primary: primaryResult,
          shadow: shadowResult,
          timestamp: new Date(),
        });
      }

      // Track metrics
      metrics.increment('parallel_run.shadow.success');
    } catch (error) {
      // Shadow errors don't affect users
      logger.error('Shadow execution failed', { orderId: order.id, error });
      metrics.increment('parallel_run.shadow.error');
    }
  });

  // Always return trusted result
  return primaryResult;
}

Key benefits:

Zero risk to users
Validates with real production data
Finds edge cases you didn't anticipate
Builds confidence before cutover

2. Comparative Analysis

Not all differences are bugs. Parallel runs help you understand what's changed:

interface ComparisonResult {
  matches: boolean;
  differences: Difference[];
  severity: 'critical' | 'major' | 'minor' | 'cosmetic';
}

function compareResults(
  primary: OrderResult,
  shadow: OrderResult
): ComparisonResult {
  const differences: Difference[] = [];

  // Critical: Different total amounts
  if (primary.totalAmount !== shadow.totalAmount) {
    differences.push({
      field: 'totalAmount',
      primary: primary.totalAmount,
      shadow: shadow.totalAmount,
      severity: 'critical',
    });
  }

  // Major: Different line items
  if (!arrayEquals(primary.lineItems, shadow.lineItems)) {
    differences.push({
      field: 'lineItems',
      primary: primary.lineItems,
      shadow: shadow.lineItems,
      severity: 'major',
    });
  }

  // Minor: Different formatting
  if (primary.formattedDate !== shadow.formattedDate) {
    differences.push({
      field: 'formattedDate',
      primary: primary.formattedDate,
      shadow: shadow.formattedDate,
      severity: 'minor',
    });
  }

  const highestSeverity = getHighestSeverity(differences);

  return {
    matches: differences.length === 0,
    differences,
    severity: highestSeverity || 'cosmetic',
  };
}

3. Gradual Confidence Building

Use parallel runs to progressively increase confidence:

Phase 1: Development/Staging

Run parallel on test data
Fix obvious bugs
Validate basic correctness

Phase 2: Shadow Production Traffic

Run new code in shadow mode
Compare outputs, log differences
Fix issues found with real data

Phase 3: Canary with Parallel Validation

Route 5% of traffic to new implementation
Run old implementation in parallel for validation
Compare results, but use new results

Phase 4: Full Cutover

All traffic to new implementation
Old implementation removed

4. Performance Considerations

Running two systems has costs:

// Option 1: Async shadow (doesn't slow primary path)
async function processPayment(payment: Payment) {
  const primaryResult = await legacyProcessor.process(payment);

  // Don't await - run in background
  shadowRun(payment, primaryResult);

  return primaryResult;
}

// Option 2: Sample subset of traffic
async function processPayment(payment: Payment) {
  const primaryResult = await legacyProcessor.process(payment);

  // Only shadow 10% of requests
  if (Math.random() < 0.1) {
    shadowRun(payment, primaryResult);
  }

  return primaryResult;
}

// Option 3: Replay traffic offline
async function processPayment(payment: Payment) {
  const primaryResult = await legacyProcessor.process(payment);

  // Log for offline replay
  await trafficLog.append({
    timestamp: Date.now(),
    input: payment,
    output: primaryResult,
  });

  return primaryResult;
}

Real-World Applications

Example 1: Shipping Cost Calculator Migration

Scenario: Replacing legacy shipping calculation with modern rules engine.

class ShippingCalculator {
  constructor(
    private legacy: LegacyShippingCalculator,
    private modern: ModernShippingCalculator,
    private featureFlags: FeatureFlags,
    private monitor: Monitor
  ) {}

  async calculate(shipment: Shipment): Promise<ShippingCost> {
    const startTime = Date.now();

    // Calculate with both systems
    const [legacyResult, modernResult] = await Promise.all([
      this.legacy.calculate(shipment),
      this.modern.calculate(shipment),
    ]);

    const duration = Date.now() - startTime;

    // Compare results
    const comparison = this.compareResults(legacyResult, modernResult);

    // Log discrepancies
    if (!comparison.matches) {
      await this.monitor.logDiscrepancy({
        shipmentId: shipment.id,
        legacy: legacyResult,
        modern: modernResult,
        differences: comparison.differences,
        severity: comparison.severity,
      });
    }

    // Metrics
    this.monitor.recordComparison({
      matches: comparison.matches,
      duration,
      severity: comparison.severity,
    });

    // Feature flag determines which result to use
    if (this.featureFlags.isEnabled('modern-shipping-calc')) {
      return modernResult;
    }

    return legacyResult;
  }

  private compareResults(
    legacy: ShippingCost,
    modern: ShippingCost
  ): ComparisonResult {
    const differences: Difference[] = [];

    if (Math.abs(legacy.cost - modern.cost) > 0.01) {
      differences.push({
        field: 'cost',
        legacy: legacy.cost,
        modern: modern.cost,
        severity: 'critical',
      });
    }

    if (legacy.carrier !== modern.carrier) {
      differences.push({
        field: 'carrier',
        legacy: legacy.carrier,
        modern: modern.carrier,
        severity: 'major',
      });
    }

    return {
      matches: differences.length === 0,
      differences,
      severity: differences[0]?.severity || 'cosmetic',
    };
  }
}

Results after 2 weeks of parallel running:

Found 23 edge cases where new logic differed
5 were bugs in new system (fixed)
12 were intentional improvements (documented)
6 were legacy bugs we could now fix
Confidence to switch over 100% of traffic

Example 2: Search Algorithm Replacement

Scenario: Replacing Elasticsearch with Algolia for better relevance.

class SearchService {
  async search(query: string, filters: Filters): Promise<SearchResults> {
    // Primary: Elasticsearch (current production)
    const esResults = await this.elasticsearch.search(query, filters);

    // Shadow: Algolia (being evaluated)
    this.runShadowSearch(query, filters, esResults);

    return esResults;
  }

  private async runShadowSearch(
    query: string,
    filters: Filters,
    primaryResults: SearchResults
  ) {
    try {
      const shadowResults = await this.algolia.search(query, filters);

      // Compare result sets
      const comparison = this.compareSearchResults(primaryResults, shadowResults);

      // Log for analysis
      await this.analytics.logSearchComparison({
        query,
        filters,
        primaryResultCount: primaryResults.hits.length,
        shadowResultCount: shadowResults.hits.length,
        topResultsMatch: comparison.topResultsMatch,
        orderSimilarity: comparison.orderSimilarity,
        relevanceScoreDelta: comparison.relevanceScoreDelta,
      });

      // Alert on significant differences
      if (comparison.orderSimilarity < 0.7) {
        await this.alerting.notify({
          type: 'search_results_divergence',
          query,
          similarity: comparison.orderSimilarity,
        });
      }
    } catch (error) {
      logger.error('Shadow search failed', { query, error });
      metrics.increment('shadow_search.errors');
    }
  }

  private compareSearchResults(
    primary: SearchResults,
    shadow: SearchResults
  ) {
    // Top 10 results
    const primaryTop10 = primary.hits.slice(0, 10).map((h) => h.id);
    const shadowTop10 = shadow.hits.slice(0, 10).map((h) => h.id);

    // How many of top 10 match?
    const topResultsMatch = intersection(primaryTop10, shadowTop10).length / 10;

    // Rank correlation
    const orderSimilarity = this.calculateRankCorrelation(
      primaryTop10,
      shadowTop10
    );

    // Average relevance score difference
    const relevanceScoreDelta = this.averageScoreDifference(primary, shadow);

    return {
      topResultsMatch,
      orderSimilarity,
      relevanceScoreDelta,
    };
  }
}

Example 3: Tax Calculation Migration

Scenario: Moving from in-house tax calculation to third-party service (Avalara).

class TaxCalculator {
  async calculateTax(transaction: Transaction): Promise<TaxResult> {
    const [inHouseResult, avalaraResult] = await Promise.all([
      this.calculateInHouse(transaction),
      this.calculateWithAvalara(transaction),
    ]);

    // Compare amounts
    const difference = Math.abs(inHouseResult.taxAmount - avalaraResult.taxAmount);
    const percentDifference = (difference / inHouseResult.taxAmount) * 100;

    // Log significant differences
    if (percentDifference > 1) {
      await this.reportTaxDiscrepancy({
        transactionId: transaction.id,
        inHouse: inHouseResult.taxAmount,
        avalara: avalaraResult.taxAmount,
        difference,
        percentDifference,
        jurisdiction: transaction.jurisdiction,
      });
    }

    // For now, use in-house (we trust it)
    // Later, switch to Avalara once confident
    return inHouseResult;
  }
}

Common Pitfalls

1. Performance Impact on User Experience

Problem: Running two implementations doubles processing time, slowing responses.

Solution: Run shadow asynchronously, don't await results:

// Bad: Waits for both
const [old, new] = await Promise.all([oldImpl(), newImpl()]);

// Good: Doesn't wait for shadow
const result = await oldImpl();
fireAndForget(newImpl()); // Runs async
return result;

2. Comparison Logic Bugs

Problem: Difference detection has bugs, creating false positives/negatives.

Solution: Test comparison logic thoroughly:

describe('Result Comparison', () => {
  it('should detect amount differences', () => {
    const result = compareResults(
      { amount: 100.00, currency: 'USD' },
      { amount: 100.01, currency: 'USD' }
    );
    expect(result.matches).toBe(false);
  });

  it('should ignore cosmetic differences', () => {
    const result = compareResults(
      { timestamp: '2024-01-01T00:00:00Z' },
      { timestamp: '2024-01-01T00:00:00.000Z' }
    );
    expect(result.matches).toBe(true);
  });
});

3. Alert Fatigue

Problem: Too many difference alerts, team starts ignoring them.

Solution: Categorize by severity, only alert on critical differences:

if (comparison.severity === 'critical') {
  await pagerDuty.alert(comparison);
} else if (comparison.severity === 'major') {
  await slack.notify(comparison);
} else {
  await datadog.log(comparison);
}

4. Forgetting to Remove Parallel Run Code

Problem: Parallel run infrastructure becomes permanent, adding complexity.

Solution: Set a deadline and cleanup plan before starting:

// Add TODO with removal date
// TODO: Remove parallel run code after 2024-03-01
if (ENABLE_PARALLEL_RUN) {
  runShadowImplementation();
}

Key Takeaways

Parallel runs validate new implementations using real production traffic with zero user risk
Shadow traffic receives same inputs but outputs are logged, not used
Comparing results helps identify bugs, edge cases, and intentional differences
Run asynchronously or sample traffic to avoid performance impact
Categorize differences by severity to focus on critical issues
This pattern builds confidence before full cutover
Remove parallel run infrastructure after migration completes