Parallel Run Pattern
The Parallel Run pattern is a risk mitigation technique that allows you to validate new implementations against production traffic before fully committing to them. By running old and new code side-by-side, comparing their outputs, and only using the old (trusted) results to serve users, you can find bugs and edge cases with zero user impact.
This pattern is particularly valuable when replacing critical business logic, migrating algorithms, or modernizing systems where correctness is paramount.
Video Summary
The video presentation explores the Parallel Run pattern in depth:
- When and why to use parallel runs versus other migration strategies
- Architecture patterns for running code in parallel
- Strategies for comparing outputs and detecting differences
- Handling performance implications of running two systems
- Monitoring and alerting on discrepancies
- Real-world examples from companies using this at scale
- Transitioning from parallel run to full cutover
Key Concepts
1. Shadow Traffic: Run New Code Without User Impact
The fundamental principle: new code receives the same inputs as old code, but its outputs are ignored (for now).
async function processOrder(order: Order): Promise<OrderResult> {
// Primary path: old, trusted implementation
const primaryResult = await legacyOrderProcessor.process(order);
// Shadow path: new implementation (runs but doesn't affect users)
executeInBackground(async () => {
try {
const shadowResult = await newOrderProcessor.process(order);
// Compare results
if (!resultsMatch(primaryResult, shadowResult)) {
await logMismatch({
orderId: order.id,
primary: primaryResult,
shadow: shadowResult,
timestamp: new Date(),
});
}
// Track metrics
metrics.increment('parallel_run.shadow.success');
} catch (error) {
// Shadow errors don't affect users
logger.error('Shadow execution failed', { orderId: order.id, error });
metrics.increment('parallel_run.shadow.error');
}
});
// Always return trusted result
return primaryResult;
}
Key benefits:
- Zero risk to users
- Validates with real production data
- Finds edge cases you didn't anticipate
- Builds confidence before cutover
2. Comparative Analysis
Not all differences are bugs. Parallel runs help you understand what's changed:
interface ComparisonResult {
matches: boolean;
differences: Difference[];
severity: 'critical' | 'major' | 'minor' | 'cosmetic';
}
function compareResults(
primary: OrderResult,
shadow: OrderResult
): ComparisonResult {
const differences: Difference[] = [];
// Critical: Different total amounts
if (primary.totalAmount !== shadow.totalAmount) {
differences.push({
field: 'totalAmount',
primary: primary.totalAmount,
shadow: shadow.totalAmount,
severity: 'critical',
});
}
// Major: Different line items
if (!arrayEquals(primary.lineItems, shadow.lineItems)) {
differences.push({
field: 'lineItems',
primary: primary.lineItems,
shadow: shadow.lineItems,
severity: 'major',
});
}
// Minor: Different formatting
if (primary.formattedDate !== shadow.formattedDate) {
differences.push({
field: 'formattedDate',
primary: primary.formattedDate,
shadow: shadow.formattedDate,
severity: 'minor',
});
}
const highestSeverity = getHighestSeverity(differences);
return {
matches: differences.length === 0,
differences,
severity: highestSeverity || 'cosmetic',
};
}
3. Gradual Confidence Building
Use parallel runs to progressively increase confidence:
Phase 1: Development/Staging
- Run parallel on test data
- Fix obvious bugs
- Validate basic correctness
Phase 2: Shadow Production Traffic
- Run new code in shadow mode
- Compare outputs, log differences
- Fix issues found with real data
Phase 3: Canary with Parallel Validation
- Route 5% of traffic to new implementation
- Run old implementation in parallel for validation
- Compare results, but use new results
Phase 4: Full Cutover
- All traffic to new implementation
- Old implementation removed
4. Performance Considerations
Running two systems has costs:
// Option 1: Async shadow (doesn't slow primary path)
async function processPayment(payment: Payment) {
const primaryResult = await legacyProcessor.process(payment);
// Don't await - run in background
shadowRun(payment, primaryResult);
return primaryResult;
}
// Option 2: Sample subset of traffic
async function processPayment(payment: Payment) {
const primaryResult = await legacyProcessor.process(payment);
// Only shadow 10% of requests
if (Math.random() < 0.1) {
shadowRun(payment, primaryResult);
}
return primaryResult;
}
// Option 3: Replay traffic offline
async function processPayment(payment: Payment) {
const primaryResult = await legacyProcessor.process(payment);
// Log for offline replay
await trafficLog.append({
timestamp: Date.now(),
input: payment,
output: primaryResult,
});
return primaryResult;
}
Real-World Applications
Example 1: Shipping Cost Calculator Migration
Scenario: Replacing legacy shipping calculation with modern rules engine.
class ShippingCalculator {
constructor(
private legacy: LegacyShippingCalculator,
private modern: ModernShippingCalculator,
private featureFlags: FeatureFlags,
private monitor: Monitor
) {}
async calculate(shipment: Shipment): Promise<ShippingCost> {
const startTime = Date.now();
// Calculate with both systems
const [legacyResult, modernResult] = await Promise.all([
this.legacy.calculate(shipment),
this.modern.calculate(shipment),
]);
const duration = Date.now() - startTime;
// Compare results
const comparison = this.compareResults(legacyResult, modernResult);
// Log discrepancies
if (!comparison.matches) {
await this.monitor.logDiscrepancy({
shipmentId: shipment.id,
legacy: legacyResult,
modern: modernResult,
differences: comparison.differences,
severity: comparison.severity,
});
}
// Metrics
this.monitor.recordComparison({
matches: comparison.matches,
duration,
severity: comparison.severity,
});
// Feature flag determines which result to use
if (this.featureFlags.isEnabled('modern-shipping-calc')) {
return modernResult;
}
return legacyResult;
}
private compareResults(
legacy: ShippingCost,
modern: ShippingCost
): ComparisonResult {
const differences: Difference[] = [];
if (Math.abs(legacy.cost - modern.cost) > 0.01) {
differences.push({
field: 'cost',
legacy: legacy.cost,
modern: modern.cost,
severity: 'critical',
});
}
if (legacy.carrier !== modern.carrier) {
differences.push({
field: 'carrier',
legacy: legacy.carrier,
modern: modern.carrier,
severity: 'major',
});
}
return {
matches: differences.length === 0,
differences,
severity: differences[0]?.severity || 'cosmetic',
};
}
}
Results after 2 weeks of parallel running:
- Found 23 edge cases where new logic differed
- 5 were bugs in new system (fixed)
- 12 were intentional improvements (documented)
- 6 were legacy bugs we could now fix
- Confidence to switch over 100% of traffic
Example 2: Search Algorithm Replacement
Scenario: Replacing Elasticsearch with Algolia for better relevance.
class SearchService {
async search(query: string, filters: Filters): Promise<SearchResults> {
// Primary: Elasticsearch (current production)
const esResults = await this.elasticsearch.search(query, filters);
// Shadow: Algolia (being evaluated)
this.runShadowSearch(query, filters, esResults);
return esResults;
}
private async runShadowSearch(
query: string,
filters: Filters,
primaryResults: SearchResults
) {
try {
const shadowResults = await this.algolia.search(query, filters);
// Compare result sets
const comparison = this.compareSearchResults(primaryResults, shadowResults);
// Log for analysis
await this.analytics.logSearchComparison({
query,
filters,
primaryResultCount: primaryResults.hits.length,
shadowResultCount: shadowResults.hits.length,
topResultsMatch: comparison.topResultsMatch,
orderSimilarity: comparison.orderSimilarity,
relevanceScoreDelta: comparison.relevanceScoreDelta,
});
// Alert on significant differences
if (comparison.orderSimilarity < 0.7) {
await this.alerting.notify({
type: 'search_results_divergence',
query,
similarity: comparison.orderSimilarity,
});
}
} catch (error) {
logger.error('Shadow search failed', { query, error });
metrics.increment('shadow_search.errors');
}
}
private compareSearchResults(
primary: SearchResults,
shadow: SearchResults
) {
// Top 10 results
const primaryTop10 = primary.hits.slice(0, 10).map((h) => h.id);
const shadowTop10 = shadow.hits.slice(0, 10).map((h) => h.id);
// How many of top 10 match?
const topResultsMatch = intersection(primaryTop10, shadowTop10).length / 10;
// Rank correlation
const orderSimilarity = this.calculateRankCorrelation(
primaryTop10,
shadowTop10
);
// Average relevance score difference
const relevanceScoreDelta = this.averageScoreDifference(primary, shadow);
return {
topResultsMatch,
orderSimilarity,
relevanceScoreDelta,
};
}
}
Example 3: Tax Calculation Migration
Scenario: Moving from in-house tax calculation to third-party service (Avalara).
class TaxCalculator {
async calculateTax(transaction: Transaction): Promise<TaxResult> {
const [inHouseResult, avalaraResult] = await Promise.all([
this.calculateInHouse(transaction),
this.calculateWithAvalara(transaction),
]);
// Compare amounts
const difference = Math.abs(inHouseResult.taxAmount - avalaraResult.taxAmount);
const percentDifference = (difference / inHouseResult.taxAmount) * 100;
// Log significant differences
if (percentDifference > 1) {
await this.reportTaxDiscrepancy({
transactionId: transaction.id,
inHouse: inHouseResult.taxAmount,
avalara: avalaraResult.taxAmount,
difference,
percentDifference,
jurisdiction: transaction.jurisdiction,
});
}
// For now, use in-house (we trust it)
// Later, switch to Avalara once confident
return inHouseResult;
}
}
Common Pitfalls
1. Performance Impact on User Experience
Problem: Running two implementations doubles processing time, slowing responses.
Solution: Run shadow asynchronously, don't await results:
// Bad: Waits for both
const [old, new] = await Promise.all([oldImpl(), newImpl()]);
// Good: Doesn't wait for shadow
const result = await oldImpl();
fireAndForget(newImpl()); // Runs async
return result;
2. Comparison Logic Bugs
Problem: Difference detection has bugs, creating false positives/negatives.
Solution: Test comparison logic thoroughly:
describe('Result Comparison', () => {
it('should detect amount differences', () => {
const result = compareResults(
{ amount: 100.00, currency: 'USD' },
{ amount: 100.01, currency: 'USD' }
);
expect(result.matches).toBe(false);
});
it('should ignore cosmetic differences', () => {
const result = compareResults(
{ timestamp: '2024-01-01T00:00:00Z' },
{ timestamp: '2024-01-01T00:00:00.000Z' }
);
expect(result.matches).toBe(true);
});
});
3. Alert Fatigue
Problem: Too many difference alerts, team starts ignoring them.
Solution: Categorize by severity, only alert on critical differences:
if (comparison.severity === 'critical') {
await pagerDuty.alert(comparison);
} else if (comparison.severity === 'major') {
await slack.notify(comparison);
} else {
await datadog.log(comparison);
}
4. Forgetting to Remove Parallel Run Code
Problem: Parallel run infrastructure becomes permanent, adding complexity.
Solution: Set a deadline and cleanup plan before starting:
// Add TODO with removal date
// TODO: Remove parallel run code after 2024-03-01
if (ENABLE_PARALLEL_RUN) {
runShadowImplementation();
}
Key Takeaways
- Parallel runs validate new implementations using real production traffic with zero user risk
- Shadow traffic receives same inputs but outputs are logged, not used
- Comparing results helps identify bugs, edge cases, and intentional differences
- Run asynchronously or sample traffic to avoid performance impact
- Categorize differences by severity to focus on critical issues
- This pattern builds confidence before full cutover
- Remove parallel run infrastructure after migration completes
Further Reading
- Dark Launching - Related pattern from Martin Fowler
- GitHub's Scientist Library - Ruby library for parallel runs
- Testing in Production - Why it's sometimes necessary
- Experimentation Platform Design - Netflix's approach to validation