Data Migration Verification

Data migration verification is the process of ensuring data integrity and correctness when moving data between systems. While dual writes and expand-contract patterns help with the mechanics of migration, verification ensures you haven't lost, corrupted, or misinterpreted any data during the process.

Failed data migrations can result in lost revenue, compliance violations, and corrupted business records. Systematic verification is essential for high-stakes migrations.

Why Verification Matters

The Stakes

Data is permanent: Unlike code bugs that can be fixed with a new deploy, data corruption may be irreversible.

Examples of migration failures:

  • E-commerce company loses order history during database migration
  • Financial services firm miscalculates balances due to rounding errors
  • Healthcare provider loses patient records during system upgrade
  • SaaS company corrupts user data, violating privacy regulations

What Can Go Wrong

Data loss:

  • Records missing in target system
  • Fields not migrated
  • Relationships broken

Data corruption:

  • Type conversion errors
  • Character encoding issues
  • Precision loss in numeric fields
  • Date/time timezone problems

Data inconsistency:

  • Related records out of sync
  • Constraints violated
  • Business rules broken

Verification Strategies

Strategy 1: Count-Based Verification

Simplest check—ensure same number of records:

async function verifyRecordCounts() {
  const counts = await Promise.all([
    legacyDb.query('SELECT COUNT(*) as count FROM users'),
    modernDb.query('SELECT COUNT(*) as count FROM users'),
  ]);

  const legacyCount = counts[0][0].count;
  const modernCount = counts[1][0].count;

  if (legacyCount !== modernCount) {
    throw new Error(
      `Record count mismatch: legacy=${legacyCount}, modern=${modernCount}`
    );
  }

  logger.info('Record count verification passed', { count: legacyCount });
}

Limitations: Doesn't catch data corruption, only data loss.

Strategy 2: Checksum Verification

Compare aggregated checksums for fast validation:

async function verifyChecksums() {
  const [legacyChecksum, modernChecksum] = await Promise.all([
    calculateTableChecksum(legacyDb, 'users'),
    calculateTableChecksum(modernDb, 'users'),
  ]);

  if (legacyChecksum !== modernChecksum) {
    throw new Error('Checksum mismatch - data differs between systems');
  }

  logger.info('Checksum verification passed', { checksum: legacyChecksum });
}

async function calculateTableChecksum(
  db: Database,
  table: string
): Promise<string> {
  // Get sorted, consistent representation
  const rows = await db.query(
    `SELECT * FROM ${table} ORDER BY id`
  );

  // Calculate checksum
  const data = JSON.stringify(rows);
  return crypto.createHash('sha256').update(data).digest('hex');
}

Benefits: Fast, detects any data differences Limitations: Doesn't tell you what's different, only that something is

Strategy 3: Row-by-Row Comparison

Compare individual records for detailed validation:

async function verifyRowByRow() {
  const batchSize = 1000;
  let offset = 0;
  let mismatches = [];

  while (true) {
    const legacyRows = await legacyDb.query(
      'SELECT * FROM users ORDER BY id LIMIT ? OFFSET ?',
      [batchSize, offset]
    );

    if (legacyRows.length === 0) break;

    for (const legacyRow of legacyRows) {
      const modernRow = await modernDb.query(
        'SELECT * FROM users WHERE id = ?',
        [legacyRow.id]
      );

      if (!modernRow) {
        mismatches.push({
          id: legacyRow.id,
          issue: 'missing_in_modern',
          legacyData: legacyRow,
        });
      } else if (!areRowsEqual(legacyRow, modernRow[0])) {
        mismatches.push({
          id: legacyRow.id,
          issue: 'data_mismatch',
          legacyData: legacyRow,
          modernData: modernRow[0],
          differences: findDifferences(legacyRow, modernRow[0]),
        });
      }
    }

    offset += batchSize;
  }

  if (mismatches.length > 0) {
    logger.error('Row verification found mismatches', {
      count: mismatches.length,
      samples: mismatches.slice(0, 10),
    });
    throw new Error(`Found ${mismatches.length} mismatched rows`);
  }

  logger.info('Row-by-row verification passed', { rowsChecked: offset });
}

function areRowsEqual(a: any, b: any): boolean {
  // Normalize before comparison
  const normalize = (obj: any) => {
    const normalized = { ...obj };

    // Handle date comparison
    Object.keys(normalized).forEach((key) => {
      if (normalized[key] instanceof Date) {
        normalized[key] = normalized[key].toISOString();
      }
    });

    return normalized;
  };

  return JSON.stringify(normalize(a)) === JSON.stringify(normalize(b));
}

function findDifferences(a: any, b: any): string[] {
  const diffs = [];

  for (const key in a) {
    if (a[key] !== b[key]) {
      diffs.push(`${key}: ${a[key]} !== ${b[key]}`);
    }
  }

  return diffs;
}

Strategy 4: Sampling Verification

Verify a statistical sample for large datasets:

async function verifySample(sampleSize: number = 10000) {
  // Get total count
  const totalCount = await legacyDb.query('SELECT COUNT(*) FROM users');

  // Random sample
  const sampleIds = await legacyDb.query(
    'SELECT id FROM users ORDER BY RAND() LIMIT ?',
    [sampleSize]
  );

  let mismatches = 0;

  for (const { id } of sampleIds) {
    const [legacyRow, modernRow] = await Promise.all([
      legacyDb.query('SELECT * FROM users WHERE id = ?', [id]),
      modernDb.query('SELECT * FROM users WHERE id = ?', [id]),
    ]);

    if (!areRowsEqual(legacyRow[0], modernRow[0])) {
      mismatches++;
      logger.warn('Sample verification mismatch', {
        id,
        legacy: legacyRow[0],
        modern: modernRow[0],
      });
    }
  }

  const errorRate = mismatches / sampleSize;

  logger.info('Sample verification complete', {
    sampleSize,
    mismatches,
    errorRate,
    estimatedTotalErrors: Math.round(errorRate * totalCount[0].count),
  });

  if (errorRate > 0.001) {
    // More than 0.1% error rate
    throw new Error(`Error rate too high: ${(errorRate * 100).toFixed(2)}%`);
  }
}

Strategy 5: Business Rule Validation

Verify business logic constraints:

async function verifyBusinessRules() {
  const issues = [];

  // Rule 1: All orders must have a customer
  const orphanOrders = await modernDb.query(`
    SELECT o.id
    FROM orders o
    LEFT JOIN customers c ON o.customer_id = c.id
    WHERE c.id IS NULL
  `);

  if (orphanOrders.length > 0) {
    issues.push({
      rule: 'orders_must_have_customer',
      violations: orphanOrders.length,
      samples: orphanOrders.slice(0, 10),
    });
  }

  // Rule 2: Order total must equal sum of line items
  const invalidTotals = await modernDb.query(`
    SELECT
      o.id,
      o.total,
      SUM(li.quantity * li.unit_price) as calculated_total
    FROM orders o
    JOIN line_items li ON o.id = li.order_id
    GROUP BY o.id, o.total
    HAVING ABS(o.total - calculated_total) > 0.01
  `);

  if (invalidTotals.length > 0) {
    issues.push({
      rule: 'order_total_equals_line_items',
      violations: invalidTotals.length,
      samples: invalidTotals.slice(0, 10),
    });
  }

  // Rule 3: Account balances must be non-negative
  const negativeBalances = await modernDb.query(
    'SELECT id, balance FROM accounts WHERE balance < 0'
  );

  if (negativeBalances.length > 0) {
    issues.push({
      rule: 'account_balance_non_negative',
      violations: negativeBalances.length,
      samples: negativeBalances.slice(0, 10),
    });
  }

  if (issues.length > 0) {
    logger.error('Business rule violations found', { issues });
    throw new Error(`Found ${issues.length} business rule violations`);
  }

  logger.info('Business rule validation passed');
}

Comprehensive Verification Pipeline

class MigrationVerifier {
  async runFullVerification(): Promise<VerificationReport> {
    const report: VerificationReport = {
      startTime: new Date(),
      checks: [],
    };

    try {
      // Phase 1: Quick smoke tests
      await this.runCheck(report, 'record_counts', () =>
        this.verifyRecordCounts()
      );

      await this.runCheck(report, 'basic_checksums', () =>
        this.verifyChecksums()
      );

      // Phase 2: Detailed verification
      await this.runCheck(report, 'sample_comparison', () =>
        this.verifySample(10000)
      );

      // Phase 3: Business logic validation
      await this.runCheck(report, 'business_rules', () =>
        this.verifyBusinessRules()
      );

      // Phase 4: Relationship integrity
      await this.runCheck(report, 'foreign_keys', () =>
        this.verifyForeignKeys()
      );

      // Phase 5: Data quality
      await this.runCheck(report, 'data_quality', () =>
        this.verifyDataQuality()
      );

      report.endTime = new Date();
      report.status = 'passed';

      return report;
    } catch (error) {
      report.endTime = new Date();
      report.status = 'failed';
      report.error = error.message;

      return report;
    }
  }

  private async runCheck(
    report: VerificationReport,
    name: string,
    checkFn: () => Promise<void>
  ) {
    const start = Date.now();

    try {
      await checkFn();

      report.checks.push({
        name,
        status: 'passed',
        duration: Date.now() - start,
      });

      logger.info(`Verification check passed: ${name}`);
    } catch (error) {
      report.checks.push({
        name,
        status: 'failed',
        duration: Date.now() - start,
        error: error.message,
      });

      logger.error(`Verification check failed: ${name}`, { error });
      throw error;
    }
  }

  private async verifyForeignKeys() {
    // Check all foreign key relationships are valid
    const tables = await modernDb.getTables();

    for (const table of tables) {
      const foreignKeys = await modernDb.getForeignKeys(table);

      for (const fk of foreignKeys) {
        const orphans = await modernDb.query(`
          SELECT child.id
          FROM ${fk.childTable} child
          LEFT JOIN ${fk.parentTable} parent ON child.${fk.childColumn} = parent.${fk.parentColumn}
          WHERE parent.${fk.parentColumn} IS NULL
            AND child.${fk.childColumn} IS NOT NULL
        `);

        if (orphans.length > 0) {
          throw new Error(
            `Foreign key violation: ${fk.childTable}.${fk.childColumn} -> ${fk.parentTable}.${fk.parentColumn} (${orphans.length} orphans)`
          );
        }
      }
    }
  }

  private async verifyDataQuality() {
    const issues = [];

    // Check for NULL values where they shouldn't be
    const nullEmails = await modernDb.query(
      'SELECT COUNT(*) FROM users WHERE email IS NULL'
    );

    if (nullEmails[0].count > 0) {
      issues.push({ field: 'email', issue: 'unexpected_nulls', count: nullEmails[0].count });
    }

    // Check for data in expected ranges
    const futureDates = await modernDb.query(
      'SELECT COUNT(*) FROM orders WHERE created_at > NOW()'
    );

    if (futureDates[0].count > 0) {
      issues.push({ field: 'created_at', issue: 'future_dates', count: futureDates[0].count });
    }

    if (issues.length > 0) {
      throw new Error(`Data quality issues: ${JSON.stringify(issues)}`);
    }
  }
}

interface VerificationReport {
  startTime: Date;
  endTime?: Date;
  status?: 'passed' | 'failed';
  error?: string;
  checks: Array<{
    name: string;
    status: 'passed' | 'failed';
    duration: number;
    error?: string;
  }>;
}

Continuous Verification

Don't just verify once—verify continuously during migration:

class ContinuousVerifier {
  private interval: NodeJS.Timeout;

  start() {
    // Run verification every hour
    this.interval = setInterval(async () => {
      try {
        const report = await new MigrationVerifier().runFullVerification();

        await this.storeReport(report);

        metrics.gauge('migration.verification.status', report.status === 'passed' ? 1 : 0);

        if (report.status === 'failed') {
          await this.alertTeam(report);
        }
      } catch (error) {
        logger.error('Continuous verification failed', { error });
      }
    }, 60 * 60 * 1000); // 1 hour
  }

  stop() {
    clearInterval(this.interval);
  }

  private async storeReport(report: VerificationReport) {
    await db.insert('migration_verification_reports', {
      timestamp: report.startTime,
      status: report.status,
      duration: report.endTime.getTime() - report.startTime.getTime(),
      checks: JSON.stringify(report.checks),
    });
  }

  private async alertTeam(report: VerificationReport) {
    await slack.send({
      channel: '#data-migration',
      text: `⚠️ Migration verification failed!\n\nFailed checks:\n${report.checks
        .filter((c) => c.status === 'failed')
        .map((c) => `- ${c.name}: ${c.error}`)
        .join('\n')}`,
    });
  }
}

Key Takeaways

  • Verification is essential—data corruption can be irreversible
  • Use multiple verification strategies: counts, checksums, row comparison, sampling
  • Validate business rules and data quality, not just data presence
  • Run verification continuously during migration, not just once
  • Document verification results for audit trail
  • Automate verification to run frequently and catch issues early
  • Have rollback plans if verification fails
  • Test verification logic itself before relying on it

Further Reading

Sunsetting - Data Migration Verification | Sunsetting Learn