One engineer, a platform of production systems.
Four-plus years as lead engineering owner for a multi-million-dollar specialty-travel operator - modernizing an inherited PHP 5 / CentOS 7 / MySQL 5 monolith into a PHP 8 / Rocky 9 / MySQL 8 platform with Docker, a Python agent-orchestration layer, and multi-tenant AI assistants.
System overview
How the pieces connect. (Architecture shown at a deliberately high level; hostnames, addresses, and vendor specifics omitted.)
The arc
Inherited legacy, modernized in production, then extended.
The business runs on Tourbot - a full-lifecycle group-travel ERP handling the sales pipeline, itinerary building, reservations, passenger logistics, payments, communications, and 70-plus reports, plus three customer- and contractor-facing portals on the same core. When I took ownership it was a PHP 5.6 / CentOS 7 / MySQL 5.7 system whose support runway was running out, and every other part of the business depended on it staying up.
Over four years I led the modernization in production, without a freeze: an enterprise-wide migration to PHP 8, MySQL 8.4, and Rocky Linux 9 across the codebase and server fleet, while still shipping features. That baseline unblocked a second wave that wasn't possible on the old stack - containerized per-manager AI sandboxes, a Python agent-orchestration VM, a fleet-wide Prometheus/Grafana/Alertmanager metrics stack, and ProspectForge (a built, review-ready in-house replacement for the CivicIQ prospecting SaaS, pending management approval) - alongside the unglamorous habits that keep a single-engineer platform safe: a guarded server-side deploy pipeline, a PHPUnit/CI suite across four PHP versions, and a file-based cron registry with a drift monitor watching 100-plus scheduled jobs.
The throughline is an engineer who inherited a legacy system and systematically turned it into a platform - more common and more credible than a string of greenfield builds.
By the numbers
Selected, verifiable outcomes from the work below.
Engineering highlights
Five representative pieces, each problem → approach → outcome.
Manifest performance: 2,650 → 183 SQL statements/request
Problem. The group-manifest page - printed before every trip - took 5-7s and fired ~2,650 SQL statements per render from per-row constructor fan-out. Approach. Six profile-driven rounds: batched in-request prefetch caches, one shared DB handle, and a targeted history-table column + composite index - each verified byte-identical against the original HTML. Outcome. ~1s loads, statements per request down ~93%, validated on a 244-sub-reservation outlier group.PCI-conscious payments
Problem. Aging card integrations and bank-file workflows needed modernizing while customers kept paying daily. Approach. Rebuilt the card flow across merchant processors with bank-account encryption, and modernized the ACH/NACHA bank-file pipeline over SFTP - kept steady through the platform migration. Outcome. Current PCI posture and reliable money movement, with continuity for in-flight registrations and payments.Multi-tenant AI assistants for managers
Problem. Non-technical managers wanted to query business data and prototype changes safely, without waiting on developer time. Approach. Per-manager sandboxed Tourbot instances (Docker, Traefik) with isolated databases refreshed every few hours from a production mirror, per-container resource limits, allowlist-constrained (default-deny) agent command execution, and AI work isolated on per-role branches with a manual promotion step. Outcome. Managers explore data and draft workflow changes through conversation; isolation and a human gate keep production safe.Observability first, then a measured 80% database win
Problem. The bespoke status prober answered "is it up?" but not "why is database CPU at 65%?" - no metrics history, paging logic embedded in a dashboard, and performance work that was anecdotal rather than measured. Approach. Stood up Prometheus, Grafana, and Alertmanager on a Terraform-provisioned droplet - node, mysqld, and blackbox exporters across all 14 hosts (host metrics, replication health, uptime, TLS expiry), with the rule-to-SMS paging chain verified live before cutting over the old pager. Then usedperformance_schema to rank 27 days of production queries by total time: three full-table-scan hot queries dominated, fixed with three composite indexes applied as online DDL (INPLACE, LOCK=NONE), EXPLAIN-verified, rollback written first.
Outcome. One paging pipeline, capacity questions answered from dashboards, and ~80% of measured query time (~404K of ~500K query-seconds) gone - with the remaining load diagnosed as application-level N+1 volume rather than papered over with hardware.
Platform migration, no downtime
Problem. PHP 5.6, MySQL 5.7, and CentOS 7 were all past or near end-of-life under a live business. Approach. Codebase-wide PHP 5→8 migration, a MySQL 5.7→8.4 plan with dual-running databases during cutover, and a CentOS 7→Rocky 9 fleet migration with per-server runbooks. Outcome. A current, supported stack across the fleet - completed without an outage staff or customers noticed.Stack & why
The reasoning, not just the list.
Open source: I upstreamed the MySQL 8.4 caching_sha2_password auth-handshake fix this stack needed into php-mysql-replication (PR #148).
Architecture decisions: the non-obvious trade-offs - tenant isolation, the data tier, the deploy pipeline, agent identity - are written up as sanitized ADRs in infrastructure-patterns.
Confidentiality: the repositories are private per employer agreement. Architecture is described here at a high level and omits hostnames, network addresses, credential mechanisms, and vendor specifics.
Read further
The reasoning, the résumé, and the rest of the work.