Decisions

Architecture decisions, with the trade-offs kept.

21 decision records from running a multi-portal PHP/MySQL reservations platform and the tooling around it - per-tenant AI sandboxes, a server-side deploy pipeline, a self-hosted observability and IaC stack. The useful unit is not "what I used" but "what I chose, what I gave up, and when I'd choose differently." Most cluster on one seam: safely running autonomous AI agents against revenue-critical legacy systems.

Each is sanitized to the architecture level - no hostnames, credentials, or employer specifics. The source markdown lives in the infrastructure-patterns repo.

ADR 0001

Docker over bare-metal for per-tenant isolation

Pay image/ops overhead to get strong filesystem + DB-user isolation cheaply
ADR 0002

External managed DB over a containerized one

Give up "one compose up" simplicity for durable, backup-friendly state
ADR 0003

Periodic snapshot over live replication for sandboxes

Accept some staleness to gain isolation, reset-ability, and no prod write-path risk
ADR 0004

A guarded shell deploy over a hosted CI runner

Forgo ecosystem features for a dependency-free, auditable single-server deploy
ADR 0005

A scoped system user over a shared service account for an autonomous agent

More host setup in exchange for clean per-action auditing and least privilege
ADR 0006

Binlog-tailing daemons over database triggers for denormalization

Accept eventual consistency to keep derive-logic in versioned code, off the hot write path
ADR 0007

Pull-based health probing over push agents for a small fleet

Forgo deep metrics/history to keep the monitored fleet agent-free and the failure domain legible
ADR 0008

Embedded SQLite over a networked DB for single-node tooling

Give up cross-host sharing for zero operational surface on state one process owns
ADR 0009

Default-deny, host-pinned DB access over a trusted network

Take on provisioning friction so a leaked credential isn't portable off its host
ADR 0010

A pixel-equality gate over diff review for changes to generated markup

Pay a render/diff harness to safely change markup you can't audit by eye - it proves visual, not semantic, equality
ADR 0011

A pull-based metrics stack over a bespoke prober, data plane kept private

Take on a TSDB to run for real metrics/history/alerting - and bind the unauthenticated parts to loopback, exposing only one read-only pane
ADR 0012

Per-app copy-deployed sync services over a shared multi-tenant backend

Run N near-identical small services to gain physical blast-radius isolation and per-app sync-model freedom, at the cost of hand-applying shared-auth fixes across copies
ADR 0013

Staged declarative provisioning (Terraform + cloud-init + separate deploy) over one imperative bootstrap script

Take on Terraform state and a deliberately create-only provisioning token for a single box to get a reproducible, reviewable host and a clean provision/configure/deploy seam
ADR 0014

Import ~220 live DNS records into Terraform over recreating them from a desired-state list

Accept verbose generated config and provider quirks to adopt traffic-serving records with zero downtime and a no-op baseline plan, instead of risking duplicate-creates and silent deletion of forgotten records
ADR 0015

Keep Terraform state on a different provider than the compute it provisions

Take on a second vendor + scoped IAM key so a provider-level outage can't destroy both the infrastructure and the state needed to rebuild it
ADR 0016

Enforce cluster posture at admission (OPA/Gatekeeper + a VAP) over trusting reviewed manifests

Run a policy controller so the hardened posture is rejected-if-violated at the API server instead of relying on review - the control that matters once a second actor or an agent can apply to the cluster
ADR 0017

Self-host the signing instrument and its audit trail over a SaaS signature service

Take on one container plus its own cert and upkeep so the legally-binding document and its audit trail stay on infra you govern, with a named-human gate on every consequential action - the highest-stakes case of keeping the authoritative copy where you control it
ADR 0018

Runtime-injected secrets over plaintext config, with the store matched to the team

Hold one invariant (encrypted at rest, injected into the environment at runtime) and pay for two backends - a broker where a team needs sharing/revocation/audit, SOPS + age where a solo fleet needs offline zero-vendor recovery - rather than force one tool to fit both
ADR 0019

Reuse an existing passkey session to gate a second app (Caddy forward_auth) over standing up a second auth stack

Gate a second internal app by delegating to one hardened passkey session via Caddy forward_auth - ~40 lines of Caddyfile, zero new auth code, every tool inheriting the gate's future hardening - at the cost of concentrating trust in a single session
ADR 0020

A private mesh for operator shells, public MFA-gated endpoints for browser consoles, over one VPN for everything

Put SSH behind a WireGuard mesh and drop public :22, but keep browser admin consoles public behind per-audience MFA - size the boundary to who uses it, rather than hide the consoles non-technical staff reach by URL behind a VPN client they can't maintain
ADR 0021

A shared OpenTelemetry collector seam over direct backend wiring

Add one collector as the ingestion seam so new telemetry sources and backends are configuration, not new architecture - agent usage metrics only, never prompt content