The Post-Mortem We Never Had to Write

Staging passes. CI is green. A teammate adds a region column to the orders table, backfills 25,000 rows, and opens a PR. Everything looks clean.

The migration introduces a 92x query regression that nobody will catch until production.

Except this time, Scry caught it first.

The Migration

Picture a typical e-commerce app. The orders table has ~25,000 rows, a handful of indexes, and a steady workload of filters and joins. A teammate adds a region column for a new geo-filtering feature:

-- Add the region column
ALTER TABLE orders ADD COLUMN region VARCHAR(50);

-- Backfill existing orders with region data
UPDATE orders SET region = CASE
    WHEN (id % 5) = 0 THEN 'west'
    WHEN (id % 5) = 1 THEN 'east'
    WHEN (id % 5) = 2 THEN 'central'
    WHEN (id % 5) = 3 THEN 'south'
    ELSE 'north'
END;

-- NOTE: No index added!

In staging, with 100 rows, the filter query returns in under a millisecond. CI is green. The PR gets approved.

Why Staging Alone Missed It

Staging has two structural blind spots:

Wrong data volume. 100 rows vs. 25,000 (or 25 million). PostgreSQL’s query planner makes different choices at different scales – a sequential scan on 100 rows is faster than an index lookup. On 25,000 rows, it’s a performance cliff.

Wrong query patterns. Test suites run a handful of known queries. Production runs hundreds of distinct patterns with real concurrency and real data skew. The interaction between a new column and existing queries only surfaces when you replay actual traffic.

Staging tests correctness, not performance at scale. You need both. (See 100 Migrations Later for the longer argument.)

But this migration was also running through Scry.

What Scry Found

Here’s what happens when this migration runs through Scry’s pipeline:

scry-proxy is already capturing production queries transparently – no application changes needed.
The migration is applied to a shadow database, a CDC-replicated copy of production that maintains real data volume and distribution.
Scry replays the captured query workload against the shadow, comparing latency before and after the migration.

The replay report:

Scry replay report showing 3 regressed query patterns with 92x, 42x, and 20x slowdowns caused by missing index

The EXPLAIN ANALYZE confirms the root cause:

-- Without index: Seq Scan (184ms)
Seq Scan on orders  (cost=0.00..1250.00 rows=5000 width=120)
                    (actual time=0.045..184.23 rows=5000 loops=1)
  Filter: (region = 'west'::text)
  Rows Removed by Filter: 20000
Planning Time: 0.089 ms
Execution Time: 184.67 ms

The Fix

Two indexes:

CREATE INDEX CONCURRENTLY idx_orders_region ON orders(region);

-- For queries that filter by region AND status, a composite index is even better:
CREATE INDEX CONCURRENTLY idx_orders_region_status ON orders(region, status);

After applying the fix to the shadow and re-running the replay, every pattern is back to baseline:

Before vs after latency comparison showing queries dropping from 184ms, 210ms, 165ms down to 3ms, 5ms, 8ms after adding indexes

-- With index: Index Scan (3ms)
Index Scan using idx_orders_region on orders
                    (cost=0.29..125.40 rows=5000 width=120)
                    (actual time=0.032..2.89 rows=5000 loops=1)
  Index Cond: (region = 'west'::text)
Planning Time: 0.112 ms
Execution Time: 3.14 ms

184ms to 3ms. Regression eliminated before it ever touched production – no pages, no customer impact, no incident channel. In CI, the whole cycle – apply migration, replay traffic, detect regression, apply fix, re-validate – fits in a single command:

scry ci test-migration prod-db/ci-main -- alembic upgrade head

Exit code 0: safe to ship. Exit code 5: regressions detected, pipeline fails before production.

A 92x regression on a core query path would have meant pages firing, customers timing out, and an engineer reverse-engineering what changed. Instead, it was a three-line fix before the PR merged.

You can replay this exact scenario locally in under two minutes:

scry demo

Run the full end-to-end test yourself

Install the CLI Architecture deep-dive

We’re looking for design partners – teams shipping migrations against production PostgreSQL who want to close the gap between staging and prod. Request early access.