Six Rounds of Profiling: Cutting a 7-Second Page to 1 Second
The Tourbot manifest page is the operational document staff print before every trip - passengers, payments, waivers, payment schedules, and the full cancellation history, in one render for a group reservation. Production loads took 5-7 seconds, and it's hit constantly.
Six rounds of profile-driven tuning brought production loads to 0.9-1.1 seconds. SQL statements per page load dropped from around 2,650 to 183, and the 118 dedicated mysqli opens the hot path used to make collapsed into the shared PDO handle the page already had open. The HTML output stayed byte-identical.
The loop
Every round followed the same loop:
- Run an xdebug profile of
manifest.phpfrom the CLI via a wrapper that sets$_REQUEST, includes the page, and writes a cachegrind. - Use
callgrind_annotateplus a small Python script to pick the dominant call site or N+1 pattern. - Apply a surgical fix - whatever the profile pointed to, nothing else.
- Re-run the profile and diff the rendered HTML against the original to confirm zero behavioral change.
- Repeat.
The byte-identical diff was the cheapest possible regression test: anything that changed the output stopped the round until I understood why.
What was actually slow
The first profile was the most useful:
One hundred and eighteen fresh MySQL connections per page load. The Booking constructor called a payment-methods helper that opened a brand-new mysqli connection instead of reusing the shared PDO handle. Switching that one call to the shared handle was a 490 ms win - 12% of the request, from a single line.
Fifty-two Price_statement instances, each firing roughly thirty small queries. Each row's object independently re-fetched the same payment, accounts-pay, credit-card-fee, insurance, and item rows for the group.
A waiver-completeness check that built a full Booking object per passenger to read one column.
A reservation-history query filtering on Action LIKE '%Auto cancellation by payment notification program%'. Leading wildcards can't use an index, so MySQL fetched every IN-predicate row and applied the LIKE in memory.
The biggest lever: batched in-request prefetch
The first two rounds were small caches, request-scope memoization, and the connection fix. The third was structural.
I added a Price_statement::prefetch(array $resNums) method that runs about a dozen batched WHERE … IN (…) queries once and stores results in static caches keyed by reservation number. Each instance's per-row queries became cache lookups that fall back to the original query when the cache is cold. Manifest::__construct warms it once, before any per-passenger work. Other callers keep working.
One subtlety: the gross-total loop mutates a price-statement field on family-plan rows, so the cache returns clones, not references; otherwise it would corrupt downstream consumers. Byte-identical diffing catches exactly this immediately, not three weeks later when a manager prints a weird-totaled manifest.
Rounds 4 and 5 extended the pattern. The payment-schedule helper got prefetch too. A memoization key keyed by sub-reservation number switched to the resolved master - 52 sub-reservations sharing one master collapsed from 52 DB hits to 1. The waiver check moved its expensive lookups inside the international-trip branch that used them.
The one schema change
Round 6 was the only schema-level change. The leading-wildcard LIKE in the cancellation-history query cannot use an index, so I added an Action_Type column to tblHistory and tblHistoryArchive, with a composite index on (Action_Type, ResNum), and backfilled existing rows. One helper on every history insert derives the value automatically, so the cron jobs producing these rows classify them at write time.
The read side replaced the two Action LIKE '%…%' predicates with Action_Type = 'auto_cancel'. EXPLAIN rows examined per ResNum: 4,165 to 1.
What I'd take with me
Patterns I'd reach for again:
Profile first, and only fix what's hot. Every change here is justified by a number from a cachegrind output. The biggest win was a one-line change to a payment-methods helper, hard to find without a profile.
Byte-identical output diffing is the cheapest regression test on a read-only page. It catches cache-mutation bugs, missing edge-case branches, and subtle reordering instantly.
Batched cache plus per-row fallback is low-risk. A prefetch layer falling through to the original query let the manifest hot path use batched data while every other caller kept working. No flag day, no coordinated migration.
Memoize by the right key. A memo keyed by sub-reservation that should have been keyed by master never deduped anything, because the inputs were always distinct. Memoization only helps when the key matches the function's actual dependency surface.