Six Rounds of Profiling: Cutting a 7-Second Page to 1 Second
The Tourbot manifest page is the operational document staff print before every trip — passengers, payments, waivers, payment schedules, and the full cancellation history, all rolled up in one render for a group reservation. Production loads were taking 5–7 seconds, and the page is hit constantly: every manifest review, every expense check, every pre-trip print.
I spent a few days giving it a profile-driven performance pass. After six rounds of changes, production loads are 0.9–1.1 seconds. SQL statements per page load dropped from around 2,650 to 183. The 118 dedicated mysqli opens the manifest hot path used to make collapsed into the shared PDO handle the page already had open. The HTML output is byte-identical to the original.
The loop
Every round followed the same loop:
- Run an xdebug profile of
manifest.phpfrom the CLI with a wrapper that sets$_REQUEST, includes the page, and writes a cachegrind file. - Use
callgrind_annotateplus a small Python script to pick the dominant call site or N+1 pattern. - Apply a surgical fix — whatever the profile pointed to, and nothing else.
- Re-run the profile and diff the rendered HTML against the original to confirm zero behavioral change.
- Repeat.
The byte-identical HTML diff was the cheapest possible regression test. Six rounds, six confirmations. Anything that changed the output stopped the round until I understood why.
What was actually slow
The first profile was the most useful one. It showed a few patterns that, in retrospect, were obvious once named:
One hundred and eighteen fresh MySQL connections per page load. The Booking constructor was calling a payment-methods helper that opened a brand-new mysqli connection instead of reusing the shared PDO handle the rest of the app uses. Switching that one call to the shared handle was a 490 ms win — about 12% of the request, from a single line of code.
Fifty-two Price_statement instances, each firing roughly thirty small queries. A page that displays one row per passenger had a price-statement object per row, and each object was independently re-fetching the same payment, accounts-pay, credit-card-fee, insurance, and item rows for the same group.
A waiver-completeness check that built a full Booking object per passenger just to read one column.
A reservation-history query that filtered with Action LIKE '%Auto cancellation by payment notification program%'. Leading wildcards can't use an index, so MySQL was fetching every row in the IN-predicate and applying the LIKE in memory.
The biggest lever: batched in-request prefetch
The first two rounds were small caches, request-scope memoization, and the connection fix. The third round was the structural one.
I added a Price_statement::prefetch(array $resNums) method that runs about a dozen batched WHERE … IN (…) queries once, and stores the results in static caches keyed by reservation number. Each instance's old per-row queries got replaced with cache lookups that fall back to the original per-instance query when the cache wasn't warmed. Manifest::__construct warms the cache once, before any per-passenger work starts. Non-manifest callers — cancellation analysis, payment-reminder previews, the price-statement viewer — keep working unchanged.
One subtlety: the gross-total loop mutates a price-statement field on family-plan rows, so the cache returns clones, not references. Without that, the cache would have been corrupted for downstream consumers reading from it later in the request. That kind of detail is what makes byte-identical-output diffing such a useful regression test — you find these problems immediately, not three weeks later when a manager prints a manifest with weird totals.
Rounds 4 and 5 extended the same pattern. The payment-schedule helper got the prefetch treatment too. A memoization key that had been keyed by sub-reservation number got switched to be keyed by the resolved master reservation — 52 sub-reservations sharing a single master collapsed from 52 DB hits to 1. The waiver-completeness check moved its expensive lookups inside the international-trip branch that actually used them.
The one schema change
Round 6 was the only schema-level change in the pass. The leading-wildcard LIKE in the cancellation-history query wasn't going to get faster with smarter SQL — that predicate cannot use an index. So I added an Action_Type column to tblHistory and tblHistoryArchive, with a composite index on (Action_Type, ResNum), and backfilled the existing rows. The write side — one helper that runs on every history insert — derives the right value automatically, so the cron jobs that produce these rows now classify them at write time.
The read side replaced the two Action LIKE '%…%' predicates with Action_Type = 'auto_cancel'. EXPLAIN rows examined per ResNum: 4,165 to 1.
What I'd take with me
A few patterns from this work that I'd reach for again:
Profile first, and only fix what's hot. Every change in this pass is justified by a specific number from a cachegrind output. The single biggest win was a one-line change to a payment-methods helper, and it would have been hard to find without a profile. Speculative refactors don't help.
Byte-identical output diffing is the cheapest regression test you can have on a read-only page. It catches cache-mutation bugs, missing edge-case branches, and subtle reordering instantly. I'd want this on every performance pass I do on a render path.
Batched cache plus per-row fallback is low-risk. Adding a prefetch layer with a fall-through to the original query meant the manifest hot path could use batched data while every other caller of the same classes kept working unchanged. No flag day, no coordinated migration.
Memoize by the right key. A memo keyed by sub-reservation that should have been keyed by master was technically correct but functionally useless — it never deduped anything because the inputs were always distinct. Memoization is only useful when the cache key matches the function's actual dependency surface.