21 decision records from running a multi-portal PHP/MySQL reservations platform and the tooling around it - per-tenant AI sandboxes, a server-side deploy pipeline, a self-hosted observability and IaC stack. The useful unit is not "what I used" but "what I chose, what I gave up, and when I'd choose differently." Most cluster on one seam: safely running autonomous AI agents against revenue-critical legacy systems.
Each is sanitized to the architecture level - no hostnames, credentials, or employer specifics. The source markdown lives in the infrastructure-patterns repo.
Pay image/ops overhead to get strong filesystem + DB-user isolation cheaply
Give up "one compose up" simplicity for durable, backup-friendly state
Accept some staleness to gain isolation, reset-ability, and no prod write-path risk
Forgo ecosystem features for a dependency-free, auditable single-server deploy
More host setup in exchange for clean per-action auditing and least privilege
Accept eventual consistency to keep derive-logic in versioned code, off the hot write path
Forgo deep metrics/history to keep the monitored fleet agent-free and the failure domain legible
Give up cross-host sharing for zero operational surface on state one process owns
Take on provisioning friction so a leaked credential isn't portable off its host
Pay a render/diff harness to safely change markup you can't audit by eye - it proves visual, not semantic, equality
Take on a TSDB to run for real metrics/history/alerting - and bind the unauthenticated parts to loopback, exposing only one read-only pane
Run N near-identical small services to gain physical blast-radius isolation and per-app sync-model freedom, at the cost of hand-applying shared-auth fixes across copies
Take on Terraform state and a deliberately create-only provisioning token for a single box to get a reproducible, reviewable host and a clean provision/configure/deploy seam
Accept verbose generated config and provider quirks to adopt traffic-serving records with zero downtime and a no-op baseline plan, instead of risking duplicate-creates and silent deletion of forgotten records
Take on a second vendor + scoped IAM key so a provider-level outage can't destroy both the infrastructure and the state needed to rebuild it
Run a policy controller so the hardened posture is rejected-if-violated at the API server instead of relying on review - the control that matters once a second actor or an agent can apply to the cluster
Take on one container plus its own cert and upkeep so the legally-binding document and its audit trail stay on infra you govern, with a named-human gate on every consequential action - the highest-stakes case of keeping the authoritative copy where you control it
Hold one invariant (encrypted at rest, injected into the environment at runtime) and pay for two backends - a broker where a team needs sharing/revocation/audit, SOPS + age where a solo fleet needs offline zero-vendor recovery - rather than force one tool to fit both
Gate a second internal app by delegating to one hardened passkey session via Caddy forward_auth - ~40 lines of Caddyfile, zero new auth code, every tool inheriting the gate's future hardening - at the cost of concentrating trust in a single session
Put SSH behind a WireGuard mesh and drop public :22, but keep browser admin consoles public behind per-audience MFA - size the boundary to who uses it, rather than hide the consoles non-technical staff reach by URL behind a VPN client they can't maintain
Add one collector as the ingestion seam so new telemetry sources and backends are configuration, not new architecture - agent usage metrics only, never prompt content