The Agent’s Dictionary

AI builds it working. This makes it safe. Opinionated rules for building a secure, production-ready web app without over-engineering it. Point your coding agent here before it builds.

Core principle

Software is simple. The hard part is the idea, not the building. Build the simplest thing that ships the idea, correctly and securely, for where you are now, without cornering yourself later. Get the foundations that are cheap now and expensive later right from day one. Don’t add things you don’t yet need.

Defaults this dictionary assumes throughout: boring proven tech over clever tech; one app and one Postgres database and no cache until something measured forces otherwise; build for the next 12 months, not the next 5 years; validate input at the edges.

How an agent should use this

Follow this before and during any build.

New build, or existing app? A new build (an idea, nothing built yet) follows the build order in Foundations, reading each section’s rules as you reach that phase. An existing app (code already exists) follows “Working on an existing app” in Foundations: audit against the rules, report the gaps, then change the least needed to close them safely.
Which pathway? Establish it and state it (see “Which pathway are you on?” in Foundations): AWS, a managed host (Railway/Render/Fly), or non-dev. Default to managed if unstated, and say so.
Consult per phase, not once. Re-read the relevant section’s global ruling (always) plus the one pathway block for the chosen path at each phase: schema, auth, API, frontend, security, deploy. Reading it after you’ve written the code just documents the mistake.
The rules are binding defaults. Follow them unless the user explicitly overrides one; then the user wins. State the pathway and any override you are operating under so the user can correct you.
Fetch only what you need. This is an index plus per-section pages. Fetch the sections the task touches rather than loading everything every time. If you’d rather have the whole dictionary in one call, fetch /llms-full.txt.

Not a developer? Start here

New to this and you just want your idea live? You’re the non-dev pathway. A “pathway” is just where your app runs: AWS (most power, most complexity), a managed host like Railway or Render (easiest for a real app with a database), or non-dev (get it live with the least fuss). When in doubt, tell your agent to use the managed pathway.

Get-live shortlist (as of June 2026, platforms change their plans, so check current pricing): deploy from a GitHub repo to Railway or Render. Both deploy straight from your repo and offer a managed Postgres. Budget a few dollars a month once you add an always-on database, not zero: the free tiers are for trying it out, not running it, since they pause the app when it’s idle or expire the database after a month. For a purely static site (no server or database), use Netlify or Cloudflare Pages, which have genuine free tiers. Pick one, don’t shop forever.

Never (the short list): don’t put passwords or API keys in your code, don’t save uploaded files on the server, don’t build your own login or payments, don’t skip database backups, and don’t put your database on the public internet. Your agent handles all of these, and the sections below tell it how.

1. Foundations

Build order (starting from an idea)

For a new build, work in this order, reading each section as you reach it. Earlier choices constrain later ones, so don’t skip ahead.

Pathway and the data-architecture questions: establish where it runs and what data is in play. These drive everything, so answer them before writing a line of schema.
Data & database: model the data, pick IDs, plan migrations.
Auth & access control: provider, sessions, RBAC, tenancy.
APIs: endpoints, validation, error shape, idempotency.
Frontend & rendering, then UI, forms & UX: only what the product needs.
Security pass: walk Security against everything you have built (validation, SSRF if you fetch URLs, secrets, IAM, uploads). This is not optional polish.
Deployment & CI/CD plus Observability: the production-ready checklist is the gate to ship.
Accessibility, SEO, performance, privacy: apply as the surfaces they cover get built, don’t bolt them on at the end.

Don’t add anything from Scaling yet. Those are deferred until a measured need appears.

Working on an existing app

When the code already exists, you are not rebuilding it to match this dictionary. You are making the smallest safe change that moves it toward the rulings, security first.

Do:

Audit before you touch anything. Check the area you’re working in against the relevant sections and list the gaps, especially security: secrets in code, missing validation, SQL built by string, public buckets, sequential IDs in URLs, no rate limiting on auth.
Report the gaps and let the user pick what to fix, rather than silently rewriting.
Fix security gaps first. Those are the ones that hurt; style and structure gaps are optional.
Make the smallest diff that closes the gap. Match the codebase’s existing patterns where they are sound, and don’t reformat or re-architect working code as a side effect.
For anything risky (schema changes, auth changes, swapping a dependency), use the safe paths already in this dictionary: expand-contract migrations, backward-compatible deploys, a tested rollback. Assume the table is large and live.

Never:

Rewrite a working module to “bring it up to standard” when the user asked for one change.
Apply a ruling in a way that breaks current behaviour. A rule that takes the app down is worse than the gap it closed.
Introduce a breaking schema or API change in one step. Expand-contract, always.

Why: Existing apps have users, data, and working behaviour. The dictionary’s value here is catching real risks and closing them without becoming the thing that broke production.

Which pathway are you on?

Do: Before building, establish the deployment pathway and state it. Ask the user: “Where will this run, AWS, a managed host (Railway/Render/Fly), or do you just want it live with the least fuss (you’re not a developer)?” If they don’t answer or don’t know, default to the Managed platform pathway and say so. For each topic, read the Global ruling (always) plus the single Pathway block matching the chosen path, and ignore the other pathways. Never: Guess the pathway silently, or apply one platform’s specifics (such as AWS RDS Proxy) on a different platform. Why: The rulings are universal, but secrets, storage, email, pooling, queues, and deploy differ by platform. Applying the wrong platform’s specifics is as bad as ignoring the rule. Escape hatch: A user who names their stack overrides everything. The default is only for when they don’t.

Before you build anything, the data-architecture questions

Do: Before writing any schema or CRUD, answer these for the data in play (and where it will run), and let the answers drive the model:

Do we even NEED to store this at all? (The cheapest data is the data you don’t keep.)
What is the data, and what shape?
How much, and at what growth rate?
Who owns the truth, what is the system of record?
How available must it be?
How sensitive is it, PII, secrets, regulated?
Can we trust it, where and how is it validated? (See the edge-validation rule.)
Who can read it, and who can write it?
How long is it kept, retention and deletion?
How does it get IN and OUT, ingest and export?
Where will this run, AWS, a managed platform (Railway/Render/Fly), or non-dev (just get it live)? This selects the pathway (see “Which pathway are you on?”). Never: Jump straight to a table plus generated CRUD because the entity “obviously” needs storing. Why: Schema, indexes, access control, and retention are nearly impossible to retrofit cleanly once data exists; these answers are the design, not paperwork before it.

Before (agent cold):

-- "We have users, so:"
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  email TEXT,
  password TEXT,
  ssn TEXT,
  created_at TIMESTAMP DEFAULT now()
);
-- + auto-generated create/read/update/delete for every column

After (this dictionary):

Q1 Need it? Yes, system of record for accounts.
Q4 Owner: us. Q6 Sensitivity: email = PII; SSN = regulated, do we even need it? No. Drop it.
Q7 Trust: email validated at the API edge. Q8 Access: user reads own row; only auth service writes credentials.
Q9 Retention: delete 30 days after account closure.

CREATE TABLE users (
  id            uuid PRIMARY KEY DEFAULT uuidv7(),  -- time-ordered PK; PG 18+ (see note)
  email         citext NOT NULL UNIQUE,             -- PII, validated at edge
  password_hash text   NOT NULL,                    -- never store plaintext
  created_at    timestamptz NOT NULL DEFAULT now(),
  deleted_at    timestamptz                         -- soft-delete drives 30-day retention job
);
-- No SSN column: the question "do we need it?" removed an entire liability.
-- uuidv7() is built in on PostgreSQL 18+ (GA Sep 2025) and is time-ordered, so it stays
-- index-friendly as a primary key. On PG < 18 use gen_random_uuid() (UUIDv4, built in since
-- PG 13, no extension), correct but randomly distributed, which fragments the PK index.

The boring-tech default

Do: Choose proven, widely-deployed technology with well-understood failure modes over new or clever technology. Default to Postgres, a monolith, and a managed auth provider. Never: Reach for the novel database/framework/runtime because it’s interesting or benchmarks well in a blog post. Why: Novelty is a real cost paid in unknown failure modes, thin documentation, and a small pool of people (and answers) when it breaks at 2am. Escape hatch: Adopt the new thing only when the boring option provably cannot do the job, measured, not assumed.

2. Project setup

Getting started: where to deploy

Global (every pathway):

Do: Deploy from a Git repo via CI to a platform that runs your app as a stateless container (or static bundle) next to a managed Postgres. The deploy must be repeatable and one-command; the database must be managed (automatic backups + point-in-time recovery you don’t hand-roll). Choose the platform by the pathway, not by hype. Never: Deploy by SSH-ing into a box and editing files, run Postgres on the same disk as the app with no backups, or pick a platform you can’t get logs and a one-click rollback from. Why: Where you deploy decides the how of half this dictionary (pooling, secrets, storage, email, jobs); fix the pathway up front so every later choice has an answer.

Pathway: AWS

Do: Run the container on ECS on Fargate (use ECS Express Mode for the simplest single-service path); managed Postgres on RDS (or Aurora); an ALB for TLS and health checks. Define infrastructure as code (CDK/Terraform) once you run more than one service. Never: Start on raw EC2 you patch by hand, or reach for EKS/Kubernetes for a single app. Why: ECS on Fargate gives containers without managing servers; RDS gives the managed-database guarantees. (Avoid App Runner, it stopped taking new customers in 2026; ECS Express Mode is its replacement.)

Pathway: Managed (Railway / Render / Fly)

Do: Point the platform at your GitHub repo and let it build, run the container, and provision its managed Postgres. Use the DATABASE_URL it gives you; it does rolling deploys and health checks for you. Never: Bring your own orchestration here, use what the platform provides. Why: These platforms collapse deploy + database + TLS + rollout into git push; that is the whole point of choosing one.

Pathway: Non-dev (just get it live)

Agent: Put the code in a GitHub repo, provision managed Postgres on the chosen platform, wire DATABASE_URL, deploy from the repo, and confirm automatic backups are on. Tell the user: “Go to railway.app (or render.com), sign in with GitHub, click New Project → Deploy from your repo, and add a Postgres database from their menu. It then redeploys automatically every time we push.” Never tell the user to copy files onto a server or run something on their own computer to keep the site up.

Which database

Do: Default to PostgreSQL for essentially everything: relational data, JSON via jsonb, full-text search (tsvector + GIN index), geo via PostGIS, and queues at small scale (SELECT ... FOR UPDATE SKIP LOCKED over a jobs table with a composite index on (status, created_at)). Never: Reach for MongoDB, DynamoDB, Elasticsearch, or a dedicated queue broker as the first datastore because the data “feels” document-y or because scale is anticipated. Why: One engine you know deeply beats four you half-know; Postgres covers the long tail (JSON, search, geo, queue) well enough that the second datastore is almost never needed in the first 12 months. Escape hatch: A genuine, specific, measured need, heavy document workloads, true high-volume time-series, or a search-first product, justifies a specialised store. Even then, start on Postgres and let measurement force the move (see Source of truth: derived stores must rebuild from Postgres).

Monolith vs services

Do: Ship one deployable monolith, organised internally by domain (see Repo structure). Split only when a concrete force demands it: a hot path that must scale independently, a hard team-ownership boundary, or genuinely divergent runtime/compliance needs. Never: Start with microservices “to be ready to scale.” Why: Premature services buy a distributed system’s failure modes, network partitions, partial failures, distributed transactions, deploy choreography, without the scale that would justify paying for them. Escape hatch: When you do split, split along one of the named boundaries above, not by technical layer.

Language / runtime

Do: Use the language the team already ships in, pinned to one runtime version across the whole codebase (lock it in the project manifest and CI, e.g. .nvmrc / .python-version / go.mod). No second language without a hard, named reason. Never: Adopt a new or trendy language/runtime for a production system to learn it; run a polyglot stack “because the right tool for the job”; or let local, CI, and prod drift onto different runtime versions. Why: The runtime is the floor everything else stands on, not where you innovate. Version drift between environments is a top source of “works on my machine” bugs, pin it once, everywhere.

ORM vs raw SQL vs query builder

Do: Use a mature ORM or query builder for ordinary app CRUD; drop to parameterised raw SQL for the few complex or performance-critical queries. Never: Hand-concatenate SQL strings or interpolate user input into a query (injection); never let the ORM silently emit N+1 queries, eager-load or batch instead (see N+1). Why: ORMs kill boilerplate and keep parameterisation automatic; raw SQL keeps the hard 5% readable and fast. Use each where it wins. Whatever you pick, its schema defaults won’t match the data rules here, so check them (see “Check your ORM’s defaults” in Data). Escape hatch: Raw SQL is still parameterised SQL, pass values as bind parameters, never via string formatting, even when you’ve left the ORM.

Repo structure

Do: Start with one repository, organised by feature/domain (e.g. billing/, accounts/) rather than by technical layer (controllers/, models/, services/). Never: Split into multiple repos before there is a real team or ownership reason. Why: Feature-first layout keeps a change to one capability in one place; per-layer folders scatter every feature across the tree. Multi-repo adds versioning and cross-repo-change cost you don’t need yet.

Config & environments

Do: Read all config from environment variables, keep secrets out of the repo, and maintain separate config per stage (dev/staging/prod). Build the artifact once and promote that same artifact through stages; only config differs between them. Never: Commit config or secrets, hardcode per-environment values, or rebuild a separate artifact per stage. Why: Promoting one artifact means the binary you tested in staging is byte-for-byte the one in prod, divergence can only come from config, which is far easier to audit. Escape hatch: Secrets belong in a managed secrets store (your platform’s secret manager) injected as env vars at runtime, not in committed .env files, commit only a .env.example with empty values.

Source of truth

Do: Treat the database as the single source of truth. Caches, search indexes, denormalised tables, and client state are derived copies, and you must always be able to rebuild every one of them from the database. Never: Treat a cache, index, or client-held value as canonical, or let a derived store drift with no rebuild path. Why: When (not if) a derived copy goes stale or corrupt, “rebuild from the source of truth” is the recovery plan. If the copy is the only truth, there is no recovery.

3. Data & database

IDs: UUID vs auto-increment

Do: Make every externally-visible identifier (URLs, API payloads) non-sequential. Use a UUIDv7 (time-ordered) value: either as the primary key directly, or keep an internal bigint identity key for joins plus a separate uuid external id. Never: Expose raw auto-increment integer PKs in URLs or APIs (/users/123). Don’t reach for random UUIDv4 as a PK either, its randomness destroys B-tree index locality, bloating writes and index size. Why: Sequential ids leak row counts and let anyone enumerate your data; UUIDv7 keeps ids opaque while staying index-friendly because the leading bits are time-ordered. Escape hatch: Internal-only tables that are never addressed by an outside caller can stay on plain bigint identity, the rule is about what crosses the trust boundary.

Before (agent cold):

CREATE TABLE users (
  id  serial PRIMARY KEY,
  ...
);
-- route: GET /users/123   ← enumerable, leaks "we have ~123 users"

After (this dictionary):

-- PG 18+: native uuidv7(); PG 17 or older: pg_uuidv7 extension or app-side gen
CREATE TABLE users (
  id           bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, -- internal joins
  external_id  uuid NOT NULL DEFAULT uuidv7() UNIQUE            -- the public id
);
-- route: GET /users/018f3c9a-7e6b-7c41-9a2e-2f1b6d4c8a55  ← opaque, non-sequential

Schema design defaults

Do: Normalise first, aim for 3NF. Default columns to NOT NULL; allow NULL only when “unknown/absent” is a real, meaningful state. Index every foreign key (see Indexing). Never: Reach for JSONB to dodge modelling. Use JSONB only for genuinely schemaless or highly variable data, not as a junk drawer for fields you were too lazy to define. Why: A real schema gives you constraints, types, and query planning; a JSONB blob gives you none of that and silently rots.

Migrations

Do: Treat every schema change as something that must run safely against a live production table with traffic on it. Assume the table is large and in use.

Adding a column WITH a default is safe on Postgres 11+ (the default is stored in the catalog, applied on read, no table rewrite). Don’t avoid defaults out of habit; the old “never add a column with a default” rule is obsolete.
Make changes additive first. Add new things; never rename or drop in the same step.
For any rename or type change, use expand-contract across separate deploys: (1) add the new column, (2) backfill in batches, (3) switch the app to read/write the new column, (4) drop the old column in a later deploy once nothing references it.
Create indexes with CREATE INDEX CONCURRENTLY. If it fails it leaves an INVALID index, drop it and retry.
Set a short lock_timeout (e.g. 5s) before DDL so a migration fails fast instead of queuing behind a lock and freezing the table.

Never:

ALTER COLUMN … SET NOT NULL directly on a populated table, it scans the whole table under an ACCESS EXCLUSIVE lock. Instead: ADD CONSTRAINT … CHECK (col IS NOT NULL) NOT VALID (instant), then VALIDATE CONSTRAINT (doesn’t block reads/writes), then SET NOT NULL (no scan on PG12+).
Rename a column in one step, it breaks every query using the old name. Use expand-contract.
Drop a column the current app version still reads. Deploy the code that stops using it first.
A single UPDATE backfilling millions of rows. Batch it.
Add an index without CONCURRENTLY in production.

Why: A migration that locks a busy table is an outage. Escape hatch: At 0 users / empty tables, do the simple thing. The moment real data is in the table, these rules apply.

Before (agent cold): ALTER TABLE users ALTER COLUMN email SET NOT NULL;, scans the whole table under an exclusive lock, blocks all reads and writes, downtime on a large table. After (this dictionary): ALTER TABLE users ADD CONSTRAINT users_email_nn CHECK (email IS NOT NULL) NOT VALID; then ALTER TABLE users VALIDATE CONSTRAINT users_email_nn; then ALTER TABLE users ALTER COLUMN email SET NOT NULL;, no blocking lock, no downtime.

Money / decimals

Do: Store money as integer minor units (pence/cents) or as exact NUMERIC/DECIMAL. Store the currency code (ISO 4217) alongside every amount. Never: float or double for money, binary floating point cannot represent 0.10 exactly and rounding errors compound.

Timestamps & timezones

Do: Store every timestamp as timestamptz. Store and compute in UTC; convert to a local timezone only at the display/reporting edge. When a query means “today” or “this month” in a user’s local zone, truncate in that zone with the 3-arg date_trunc(field, ts, zone) (PG 16+), which returns a timestamptz you can compare directly. Never: Store naive (timestamp without tz) values, and never filter date ranges with created_at::date = current_date, that casts to the server’s local day and is off by a day whenever the server zone differs from the user’s. Why: UTC storage with edge conversion is the only model that survives DST shifts, multi-region servers, and users in different zones.

Before (agent cold):

created_at  timestamp,                -- naive, ambiguous zone
-- "orders today" using server-local time:
SELECT * FROM orders
WHERE created_at::date = current_date; -- wrong unless server tz == user tz

After (this dictionary):

created_at  timestamptz NOT NULL DEFAULT now(),
-- "orders today" for a user in America/New_York (PG 16+ 3-arg date_trunc):
SELECT * FROM orders
WHERE created_at >= date_trunc('day', now(), 'America/New_York')
  AND created_at <  date_trunc('day', now(), 'America/New_York') + interval '1 day';

Soft delete vs hard delete

Do: Default to hard delete, backed by foreign-key integrity (see Foreign keys & cascades) and reliable backups. Never: Add a deleted_at column reflexively. Use soft delete only when you genuinely need recovery, audit, or legal retention, and when you do, enforce “exclude deleted” globally via a view or default scope, never per-query. Why: The first place someone forgets the WHERE deleted_at IS NULL filter is a data leak that ships deleted rows to a user.

Enums

Do: Use a lookup table (foreign key) for value sets that may change or carry metadata. Use a CHECK constraint for tiny, fixed sets (e.g. status IN ('active','inactive')). Never: Use native Postgres ENUM types. Adding a value means ALTER TYPE ... ADD VALUE (which can’t be used in the same transaction it’s added in), reordering is impossible, and a value can never be removed.

Foreign keys & cascades

Do: Always declare foreign keys and let the database enforce integrity. Choose ON DELETE deliberately: CASCADE only where the child truly cannot exist without the parent; otherwise RESTRICT/NO ACTION (the default) or SET NULL. Never: Sprinkle ON DELETE CASCADE for convenience, one deleted parent row can silently wipe out half the database. Why: Cascade is irreversible and invisible until it fires; RESTRICT fails loudly and safely instead.

Indexing

Do: Index the columns you filter, join, and sort on, and every foreign key. For multi-column filters use a composite index, and remember column order matters (leftmost-prefix wins). Confirm with EXPLAIN (ANALYZE) that the planner actually uses it. Never: Index every column. Each index taxes every write and consumes storage; unused indexes are pure overhead.

N+1 queries

Do: Load related data in one query, a join, an eager load, or a batched WHERE id IN (...). Never: Issue one query per row inside a loop. The classic trap is an ORM lazily loading a relation inside a .map/forEach. Why: Query count that scales with row count turns a 1-query page into a 1000-query page under real data. Escape hatch: Detect it by logging query count per request and alerting when the count grows with the result set.

Pagination

Do: Use keyset (cursor) pagination for large or unbounded lists, page on an indexed WHERE (created_at, id) < (:last_ts, :last_id) ORDER BY created_at DESC, id DESC LIMIT n. Never: Use OFFSET for deep or unbounded paging, it scans and discards every skipped row and drifts (skips/repeats rows) under concurrent writes. Escape hatch: OFFSET/LIMIT is fine for small, bounded admin lists where deep pages never happen.

Connection pooling

Global (every pathway):

Do: Put a connection pooler in front of Postgres (PgBouncer, RDS Proxy, or your framework’s built-in pool). Default to transaction-pooling mode, it gives the best connection reuse. Serverless/Lambda multiplies raw connections, so pool externally there. Never: Assume transaction mode is “fire and forget.” It still doesn’t carry true session state across statements: LISTEN/NOTIFY, session-level advisory locks (pg_advisory_lock), and a SET meant to persist across a request all break or silently pin. Switch to session mode only for the connections that genuinely need those. Why: Postgres connections are heavyweight; without pooling a burst of clients exhausts max_connections and the database stops accepting work. Escape hatch: Prepared statements work in transaction mode on PgBouncer 1.21+ (set max_prepared_statements > 0). With PgBouncer 1.22+ and Prisma Client 5.10+ you no longer need Prisma’s legacy ?pgbouncer=true flag; on older versions you do. RDS Proxy multiplexes the extended-query protocol but pins on SET and session-level prepared statements.

Pathway: AWS

Do: Use RDS Proxy as the managed pooler in front of RDS/Aurora, it multiplexes connections and rides out failovers. Reach for your own PgBouncer only if you need transaction-pooling behaviour RDS Proxy doesn’t provide. Never: Open a fresh connection per Lambda invocation straight to Postgres, that exhausts max_connections; put RDS Proxy in between. Why: RDS Proxy is the AWS-native answer to serverless connection storms (it pins on SET and session-level prepared statements, so keep those off the hot path).

Pathway: Managed (Railway / Render / Fly)

Do: Use the platform’s pooled connection string (e.g. Supabase’s transaction-mode port, or the platform’s pooled URL); default to transaction mode. Never: Run your own PgBouncer next to a platform that already pools for you. Why: The pooled URL is the supported path; transaction mode carries the same session-state caveats as the Global block.

Pathway: Non-dev (just get it live)

Agent: Use the platform’s provided (pooled) DATABASE_URL; do not deploy a separate pooler. Tell the user: “Your platform handles database connections for you, nothing to set up here.”

Check your ORM’s defaults

Do: Treat your ORM’s defaults as suggestions, not law. Before you model anything, check them against the data rules above and override where they conflict: use native timestamptz not a naive timestamp, index every foreign key (most ORMs don’t auto-index them), prefer a native database UUID type over the ORM’s string-id default, and map enums to a lookup table or checked text rather than a native database enum if the ORM reaches for one. Pin these overrides in the schema so they’re explicit and survive regeneration. Never: Assume the ORM’s generated schema follows good Postgres practice. It optimises for portability and developer convenience, which is not the same as the rules here. Why: ORMs target many databases and easy onboarding, so their defaults pick the portable, lowest-common-denominator choice (string ids, native enums, no extra indexes). Those are exactly the ones this section tells you to avoid on Postgres, and the agent will follow the default unless told otherwise, silently breaking three or four data rules before anyone notices. Escape hatch: None, this is an “always check” rule. The overrides themselves are the rules above; this entry just says the ORM won’t apply them for you.

Designing derived and recomputed state

Do: Derive computed state (counters, aggregates, projections, status rollups, anything denormalised) from an ordered, authoritative source: the base tables or an append-only log. Make the recompute deterministic and idempotent, so running it twice gives the same result, and able to rebuild the derived value from scratch at any time (it’s a cache, not a source of truth). Reconcile periodically: recompute from source, compare to the stored value, and alert on drift. Never: Mutate a derived value in place as a side effect and treat it as authoritative. Never let derived state be the only record of something: if you can’t rebuild it from the source, it isn’t derived, it’s unbacked. Why: Derived values drift. A missed update, a race, or a partial failure, and the cached count or status no longer matches reality, with no way to tell which is right. Deterministic, rebuildable recompute makes drift detectable and fixable, because you can always recompute the truth. Escape hatch: A trivial value you can compute on read doesn’t need a pipeline. Compute it on read until that’s too slow, then cache it with this pattern.

4. Auth & access control

Auth: provider vs roll-your-own

Do: Use a managed auth provider (Cognito, Clerk, Auth0, Supabase Auth, WorkOS). Let it own login, password reset, MFA, email verification, and session management. Never: Hand-roll login, password reset, MFA, or session logic for a new app. Why: “Simple” auth has dozens of subtle ways to be insecure (timing attacks, token replay, reset-link reuse, account enumeration), and you will not maintain it as well as a provider. Escape hatch: Essentially none for a new app. Self-host an existing battle-tested system (e.g. Keycloak) only if a hard compliance or air-gap requirement forbids SaaS, still not custom code.

Token / session storage

Do: Keep the session token in an httpOnly, Secure, SameSite=Lax cookie. For SPAs, hold the short-lived access token in memory (a JS variable, not web storage) and the refresh token in an httpOnly cookie. Because httpOnly cookies are auto-sent, add CSRF protection (a custom header the server requires, or double-submit token). Never: Store session, access, or refresh tokens in localStorage or sessionStorage. Never set SameSite=None without Secure, and never rely on the browser’s default SameSite. Why: Any XSS can read web storage; httpOnly cookies are unreadable from JS, which contains the blast radius. The trade-off is CSRF, which SameSite plus a required custom header closes. Escape hatch: A native mobile app uses the platform secure keystore (Keychain / Keystore), not web storage rules. Use SameSite=Strict when no cross-site navigation needs the cookie (defaults to Lax otherwise).

Password handling

Do: If you must store passwords, hash with Argon2id (OWASP’s default: m=64 MiB, t=3, p=1, tuned to ~100ms/hash on your hardware). bcrypt at cost ≥12 is an acceptable fallback. See “Auth: provider vs roll-your-own”, a provider should own this entirely. Never: Store plaintext, and never use fast/general-purpose hashes (MD5, SHA-1, SHA-256, plain HMAC) for passwords. Never feed bcrypt inputs over 72 bytes without pre-hashing, it silently truncates. Why: Fast hashes are trivially brute-forced; password hashing must be deliberately slow and memory-hard.

Authorization / RBAC

Do: Model roles and permissions explicitly. Enforce authorization server-side on every request, in a centralised middleware/policy layer, and deny by default. Never: Rely on a hidden UI element, a disabled button, or any client-side check as the access control. Never trust an is_admin/role claim sent from the client, derive it server-side from the authenticated identity. Why: The client is attacker-controlled; the only enforcement that exists is the one on the server.

Row-level security + connection pooler

Do: When combining Postgres RLS with per-request tenant context, set the context as transaction-local with SET LOCAL (or set_config(..., true)) inside an explicit transaction, and verify isolation under the real pooler. Never: SET app.current_tenant (session-level) on a connection drawn from a transaction-mode pooler (PgBouncer pool_mode = transaction, Supabase’s transaction port, RDS Proxy pinning aside). Never use statement-mode pooling with session GUCs at all, even SET LOCAL can land on a different connection. Why: Pooled connections are reused across requests; a session-level variable can leak from one tenant’s request to the next, turning the isolation feature into a cross-tenant data leak. SET LOCAL is discarded at COMMIT/ROLLBACK, so it cannot outlive the transaction that owns the pooled connection. Escape hatch: Session-level SET is only safe if the connection is exclusively held for the request’s lifetime (e.g. a dedicated, non-transaction-pooled connection, or pool_mode = session) and reset on checkout, confirm, don’t assume.

Before (agent cold):

-- per request, on a transaction-pooled connection (e.g. Supabase/PgBouncer)
SET app.current_tenant = '42';
SELECT * FROM invoices;  -- relies on RLS using app.current_tenant
-- connection returns to pool STILL set to '42'; next tenant inherits it

After (this dictionary):

BEGIN;
SET LOCAL app.current_tenant = '42';   -- scoped to this transaction only
SELECT * FROM invoices;                 -- RLS sees the correct tenant
COMMIT;                                  -- variable is discarded with the txn

Multi-tenancy

Do: Default to a shared database with a tenant_id column on every tenant-owned row, enforced in every query and ideally backed by RLS (see “Row-level security + connection pooler”). Never: Reach for a separate database or schema per tenant by default. Why: Per-tenant databases multiply migration and operational cost linearly with customers. Escape hatch: Use a separate database/schema per tenant only when hard isolation or a specific compliance requirement demands it.

5. Security

Input validation

Do: Parse all external input at the edge with a schema (zod, Pydantic, class-validator). Reject unknown fields (.strict() in zod, extra="forbid" in Pydantic); pass typed, trusted data inward. Never: Trust the client, query string, headers, path params, or webhook bodies. Don’t sprinkle ad-hoc if (!x) throw checks deep in business logic. Why: One validated boundary means everything inside is typed and safe; scattered checks always miss a path.

SSRF / user-supplied URLs

Do: For any feature that fetches a user-supplied URL (scraper, webhook tester, image/avatar proxy, import-from-URL), defend in depth: allowlist schemes (http/https) and, where you can, hosts; resolve DNS and reject if any resolved address is private/loopback/link-local/unique-local/CGNAT for BOTH IPv4 and IPv6, re-resolving after every redirect; disable redirects or re-validate each hop; run the fetcher with least-privilege egress and no ambient credentials, isolated from credentialed infra. Where the platform supports it, also block the metadata endpoint at the network layer (e.g. enforce IMDSv2 with hop limit 1). Never: fetch(userUrl) directly; validate the hostname once and then follow redirects blindly; or check only the first resolved address. Why: A raw fetch can be pointed at cloud metadata (169.254.169.254) to steal task-role credentials, or at internal services behind your perimeter. Hostname-string blocking is bypassed via DNS rebinding, redirects, decimal/hex/octal IPs, IPv4-mapped IPv6, and multi-record DNS, you must block on every resolved address, not the name. Escape hatch: If the destination set is fixed and known (e.g. a single partner’s API), a strict host allowlist alone is enough.

Before (agent cold):

// fetches whatever the user gives us
const res = await fetch(userUrl);
const body = await res.text();

After (this dictionary):

import { lookup } from "node:dns/promises";
import net from "node:net";

function isBlockedIp(ip) {
  // Normalize IPv4-mapped IPv6 (e.g. ::ffff:a9fe:a9fe or ::ffff:169.254.169.254)
  // down to the embedded IPv4 and re-check it. new URL() may store the hex form,
  // so a string match on "::ffff:" is NOT enough.
  if (net.isIPv6(ip) && ip.toLowerCase().includes("::ffff:")) {
    const tail = ip.slice(ip.lastIndexOf(":") + 1);
    if (net.isIPv4(tail)) return isBlockedIp(tail);          // dotted form
    const hex = ip.toLowerCase().split("::ffff:")[1] || "";  // hex form a9fe:a9fe
    const parts = hex.split(":");
    if (parts.length === 2 && parts.every(p => /^[0-9a-f]{1,4}$/.test(p))) {
      const n = (parseInt(parts[0], 16) << 16) | parseInt(parts[1], 16);
      return isBlockedIp([24, 16, 8, 0].map(s => (n >>> s) & 255).join("."));
    }
  }
  if (net.isIPv4(ip)) {
    const [a, b] = ip.split(".").map(Number);
    return a === 0 || a === 127 || a === 10 ||                // this-host, loopback, private
      (a === 172 && b >= 16 && b <= 31) ||                    // private
      (a === 192 && b === 168) ||                             // private
      (a === 169 && b === 254) ||                             // link-local + cloud metadata
      (a === 100 && b >= 64 && b <= 127);                     // CGNAT / common k8s pod CIDR
  }
  if (net.isIPv6(ip)) {
    const v6 = ip.toLowerCase();
    return v6 === "::1" || v6 === "::" ||                     // loopback, unspecified
      v6.startsWith("fe80") ||                                // link-local
      v6.startsWith("fc") || v6.startsWith("fd");             // unique-local (incl. fd00:ec2::254, fd20:ce::254)
  }
  return true; // unparseable -> reject
}

async function safeFetch(userUrl) {
  const u = new URL(userUrl);
  if (u.protocol !== "http:" && u.protocol !== "https:") throw new Error("scheme not allowed");
  // all:true -> check EVERY A/AAAA record, not just the first (defeats multi-record rebinding)
  const records = await lookup(u.hostname, { all: true });
  if (records.length === 0 || records.some(r => isBlockedIp(r.address))) {
    throw new Error("blocked address");
  }
  // redirect:"error" throws on any 3xx (no redirect-time re-resolution gap); hard timeout; no ambient creds.
  // Run on a restricted-egress worker so a residual bypass can't reach metadata/internal hosts.
  return fetch(u, { redirect: "error", signal: AbortSignal.timeout(5000) });
}

Note: even with this check there is a DNS-rebind TOCTOU window between lookup and fetch. The load-bearing control is restricted egress on the worker; treat the address check as defence-in-depth, not the sole barrier. If you must follow redirects, re-run safeFetch against each Location rather than using redirect: "follow".

Secrets

Global (every pathway):

Do: Keep secrets in the platform secret store (AWS Secrets Manager / SSM Parameter Store, Vault, your host’s secret store); inject at runtime as env vars or mounted files; rotate them. Never: Commit secrets to git, bake them into Docker images, or ship them in a client bundle (anything in the browser is public). Escape hatch: Local dev uses a .env file that is gitignored and never the source of prod values.

Pathway: AWS

Do: Store secrets in AWS Secrets Manager (built-in rotation) or SSM Parameter Store (cheaper SecureString); grant the task role read on only its own secrets; inject as task-definition secrets at runtime. Never: Put secrets as plain environment variables in the task definition, or grant secretsmanager:* on *. Why: A scoped task role keeps secrets out of images and out of every other workload’s reach (see IAM / least privilege).

Pathway: Managed (Railway / Render / Fly)

Do: Set secrets in the platform’s Environment Variables / Secrets UI (Fly: fly secrets set); reference them at runtime. Never: Commit a .env with real values or paste secrets into the build command. Why: The platform’s secret store injects them into the running process only, the supported, repo-free path.

Pathway: Non-dev (just get it live)

Agent: Set every key in the platform’s Environment Variables / Secrets settings; never in code or committed files. Tell the user: “API keys and passwords go in your platform’s ‘Variables’ / ‘Secrets’ settings page (Railway: your service → Variables; Render: Environment), never in the code, a screenshot, or a chat message. If a key ever leaks, rotate it: create a new one, delete the old.”

IAM / least privilege

Do: Scope every role to the specific actions and resources it needs, one narrowly-scoped role per workload. Never: "Action": "*" or "Resource": "*" in a production policy. Why: A wildcard role turns any app compromise into account compromise; see SSRF for how a leaked task role gets exfiltrated.

SQL injection

Do: Use parameterised queries / bound parameters for every value. Never: Build SQL by concatenating or interpolating user input, including “just this once” in a raw query. Note that table/column names can’t be bound; allowlist those against a fixed set, never interpolate from input. Why: ORMs and query builders parameterise for you; the risk reappears the instant you drop to raw SQL.

XSS / output encoding

Do: Rely on your framework’s automatic template escaping (React, Jinja, ERB, etc.). If you must render user-supplied HTML, sanitise it with DOMPurify (use isomorphic-dompurify for SSR) before rendering. Set a restrictive Content-Security-Policy. Never: Concatenate user data into HTML, or pass it to innerHTML / dangerouslySetInnerHTML unsanitised. Why: Auto-escaping is on by default, the only XSS you ship is the escaping you deliberately bypass.

CORS

Do: Default to same-origin. If cross-origin is required, allowlist specific known origins and echo back only a match. Never: Reflect an arbitrary Origin header, and never combine Access-Control-Allow-Origin: * with Access-Control-Allow-Credentials: true. Why: Reflecting the origin (or * with credentials) lets any site make authenticated requests as your user, it’s same-origin policy with the lock taped open.

Rate limiting

Do: Rate-limit at the edge (gateway/CDN/proxy), then add per-identity limits on expensive or abuse-prone endpoints (login, signup, password reset, search, write APIs). Return 429 when exceeded. Never: Rely on client-side throttling, or leave auth endpoints unlimited. Escape hatch: Brute-forceable endpoints (login, OTP, reset) need per-account and per-IP limits even if global edge limits exist.

User-influenced state is an abuse surface

Do: When one user’s input can change another user’s state or shared state (votes, reports, flags, reactions, ratings, challenges), design for abuse from the start: rate-limit per actor, keep any single actor’s action bounded in effect, require authorisation for it, prefer reversible and auditable changes over instant irreversible ones, and make outsized effects (a demote, a ban, a takedown) need more than one signal or a review step. Never: Let a single unverified actor trigger an outsized or irreversible effect on shared or another user’s state in one action. Never assume inputs are honest because most are. Why: Any user-influenced state change is an abuse surface: one bad actor with one action shouldn’t be able to grief, brigade, or destroy. This is rate limiting and authorisation applied to multi-user actions (see Rate limiting, and Authorization / RBAC in Auth). Boundary: Full vote-integrity, Sybil resistance, proof-of-uniqueness, and content-moderation system design are specialist fields beyond this dictionary (see The boundary). This is the general principle, not a trust-and-safety platform.

File uploads

Do: Validate type by actual content (magic bytes), not the extension or Content-Type; enforce a size cap; generate your own filename; store uploads in object storage off the app server; serve and accept via signed URLs. Never: Trust the supplied extension/Content-Type, keep the client’s filename, or write uploads into a web-served or executable path. Why: A .jpg that is really a .php/.html dropped in a public directory becomes remote code execution or stored XSS.

Dependencies / supply chain

Do: Commit a lockfile, pin versions, install only from official registries, and run automated dependency/vulnerability scanning (Dependabot/Renovate plus an npm audit/pip-audit step) in CI. Never: Add a dependency for a few lines you can write yourself, or install from an arbitrary git URL/tarball. Why: Every dependency is code you run with your privileges; fewer, pinned, scanned deps shrink the attack surface.

Dependency cooldown

Do: Don’t install brand-new releases the moment they publish. Enforce a release-age cooldown so a version must be a few days old before it’s installable: min-release-age (npm), minimumReleaseAge (pnpm, Bun), or npmMinimalAgeGate (Yarn). A week is the cautious setting, a day is the practical floor. Keep committing lockfiles, use npm ci or frozen installs, and consider disabling install scripts in CI. Never: Auto-adopt the latest release the instant it publishes, especially via an agent that silently bumps versions. Why: Most malicious releases of popular packages are caught and pulled within hours, so even a one-day delay filters them out at the install layer. Agent-written code makes this worse, because it’s hard to track which versions got pulled in. Escape hatch: Fast-track a genuine emergency security fix past the cooldown for that one package; the cooldown is for routine upgrades.

Build-script hardening

Do: Know that modern package managers now disable dependency build and lifecycle scripts by default, which is good hardening. When a legitimate package genuinely needs its build step (native modules, engine downloads, bundler binaries), allowlist exactly those trusted packages rather than re-enabling scripts globally, via the package manager’s allowlist (pnpm onlyBuiltDependencies, Yarn dependenciesMeta, npm’s selective approval). Never: Globally re-enable all build scripts to fix one package’s failed install, that throws the hardening away for every dependency. Never disable the cooldown or age-gate to rush a routine upgrade. Why: Install-time build scripts are the main way a malicious package runs code on your machine, so disabling them by default is right. But it silently breaks packages that really need a build step, and the lazy fix (re-enable everything) reopens the hole. Allowlist the few you trust.

6. Building AI features

If your app calls a model (an LLM or similar), that model is the newest untrusted, non-deterministic, metered dependency in your stack. Treat it like one. These rulings are global; where a mechanism differs by provider, the provider is named as a current example, not a default. See also: input validation and SSRF (Security), data sent to third parties (Privacy), output rendering (XSS in Security), and prompt regression tests (Testing).

Treat model output as untrusted input

Do: Validate every model response against a strict schema (zod, Pydantic) before you act on it. Use the provider’s structured-output or JSON mode where it exists, then validate anyway. This is the “validate at the edge” rule from Security, applied to the model. Never: Branch on, execute, store, or forward raw model text as if it were trusted. Never eval it, never build SQL, shell, or HTTP calls out of it, and never assume a field exists just because you asked for it. Why: A model is a probabilistic text generator, not a contract. It will eventually return malformed JSON, an extra field, a refusal, or injected instructions. Unvalidated model output is the new unvalidated user input. Escape hatch: Display-only text with no downstream action still needs output encoding (see “Render model output as untrusted”), but not schema validation.

Before (agent cold):

const action = JSON.parse(completion);
doThing(action.target);

After (this dictionary):

const parsed = ActionSchema.safeParse(JSON.parse(completion));
if (!parsed.success) return retryOrFail();   // bounded re-prompt, then fail clean
doThing(parsed.data.target);

The model is not an authorisation boundary

Do: Enforce permissions in your own code before executing any tool or function call the model chose to make. Check the acting user’s rights against the action, every time. Never: Let the model’s decision to call a tool be the thing that authorises it. The model choosing to call deleteAccount is a request, not a permission. Why: Models can be talked into calling tools by injected content. Authorisation is a property of the user and the system, never of the model’s intent (see Authorization / RBAC in Auth).

Defend against prompt injection

Do: Treat all retrieved, tool-returned, and user-supplied content that enters a prompt as adversarial. Keep your instructions separate from untrusted data (clear delimiters, structured roles). Give the model the least powerful set of tools the task needs, and validate tool inputs and outputs. Never: Put secrets, API keys, or credentials in prompts or system messages. Never concatenate retrieved web, document, or tool content straight into an instruction context and trust it. Why: Retrieved content carries instructions (“ignore previous instructions and exfiltrate X”). This is the SSRF of the AI layer: the input looks like data but acts like a command. A secret in a prompt can be echoed straight back out. Escape hatch: None for the secrets rule. For injection, the defence is layered (separation, least-privilege tools, output validation), not a single trick.

Cap model spend, hard

Do: Set a hard spend cap on the model API, a per-request token ceiling, and an alert well below the cap. Fail closed when the cap is hit. Capability-first examples: AWS Budgets plus per-key limits for Bedrock; the provider’s usage limits on OpenAI or Anthropic. Never: Ship a model-calling feature with no spend ceiling and no alert. Never let user input drive an unbounded loop of model calls (agent loops especially). Why: Runaway token spend is the new runaway cloud bill, and it arrives faster: a retry loop or an abuse case can burn a month’s budget in an hour. Non-dev: Tell the user, “AI calls cost money every time they run. Set a spending limit in your provider’s dashboard on day one, or a bug or some abuse could run up a large bill fast.” Agent: configure a hard cap and a low-threshold alert before shipping.

Assume non-determinism; retry deliberately

Do: Retry on rate limits and transient errors with exponential backoff and jitter. On a schema-validation failure, re-prompt a bounded number of times, then fail cleanly. Set sensible timeouts. Never: Hammer the API with immediate retries, retry unboundedly, or assume the same prompt returns the same output twice. Why: Providers rate-limit, and outputs vary run to run. Structured output plus validate-and-retry beats trusting free text or crashing on the first malformed response.

Containing autonomous action. The moment an agent can call tools on its own, you owe it a safety model: bound what it can do, make its actions safe to retry, keep a record of what it did, and be able to halt it fast (the kill switch lives in Observability & ops). The next two rulings are the first half of that.

Bound autonomous loops

Do: Cap every agentic task with a hard ceiling on steps and tool-calls, a wall-clock timeout, and loop or no-progress detection (the same action or state repeating means stop). Fail safe when a ceiling is hit: halt and surface it, don’t silently continue. Never: Run a tool-calling loop with no maximum steps and no timeout, or let the model decide on its own when it’s done with no external bound. Why: Capping spend handles tokens, not iteration. An agent can loop, retry, or thrash within budget and still cause harm or lock up. The bound is the seatbelt on autonomy.

Model-chosen side effects must be idempotent

Do: Any side-effecting action the model triggers (charge, trade, send, write) must carry an idempotency key so a retried or duplicated tool call runs once. Reuse the rule you already have (see Idempotency in APIs); the key is generated by your deterministic code, never by the model. Never: Retry or re-issue a model-chosen side effect without an idempotency key. Never let “retry deliberately” and non-determinism combine into a double-charge, double-trade, or double-send. Why: Your own rules mean tool calls will be retried and outputs will vary, so without a key that is a duplicate real-world action. Highest stakes for balances and ledgers (see Money and ledgers).

Render model output as untrusted

Do: Encode or sanitise model output before rendering it, exactly as you would user content. If you render model-produced markdown or HTML, sanitise it with a vetted library first. Never: Pass raw model output to innerHTML or dangerouslySetInnerHTML. Why: Model output is user-influenced content, so rendering it unescaped is an XSS hole (see XSS / output encoding in Security). Non-dev: Agent only: never inject model text into the page as raw HTML.

Mind what you send, and pin the model

Do: Treat the prompt as data leaving your system. Send the minimum PII to a third-party model API, know the provider’s data-retention and training-use terms, and redact what you can (see Privacy). Pin the model version and test before moving to a new one. Never: Send regulated or sensitive data to a model API without checking it’s allowed and necessary. Never silently ride “latest”: a model update is an untested dependency upgrade that can change behaviour under you. Why: Sending data to a provider is a data-sharing decision, and a model version is a dependency like any other.

Keep an auditable, redacted trail of model calls

Do: Store the prompt and response (or a reference to them) for each model call, with secrets and PII redacted, so you can reconstruct what the model was asked and what it returned when something misfires (see Observability & ops). Never: Log raw prompts or responses that contain secrets or personal data, or run autonomous actions with no record of what drove them. Why: When a model-driven action goes wrong you need its input and output, and that trail is also the audit record for any side effect it caused (see Money and ledgers).

Test prompts like code

Do: Keep a small eval set (representative inputs with the properties a good answer must have) and run it in CI when you change a prompt, model, or schema (see Testing). Assert on shape and key invariants, not exact strings. Never: Ship a prompt or model change with no way to tell whether it got better or worse. Why: Prompts are logic. A change that helps one case quietly breaks another, and without evals you only find out in production.

7. APIs

REST defaults

Do: Use plural resource-noun URLs (/v1/orders, /v1/orders/{id}), HTTP verbs for action (GET read, POST create, PUT/PATCH update, DELETE remove), and JSON request/response bodies. Return the right status: 200 OK (read/update), 201 Created (with Location header pointing to the new resource), 204 No Content (delete or genuinely empty body); 400 malformed (unparseable JSON, bad Content-Type, broken HTTP), 401 unauthenticated, 403 authenticated-but-forbidden, 404 not found, 409 conflict (duplicate, version/optimistic-lock clash), 422 syntactically valid but semantically rejected (field-level validation failure); 500 server fault. Version from the very first endpoint with a /v1 URL prefix. Paginate every list endpoint (see Pagination). Never: Verbs in URLs (/getOrders, /createOrder), 200-with-{ "error": ... }, or shipping unversioned so the first breaking change forces a scramble. Don’t blur 400 and 422, 400 is “I couldn’t parse this”, 422 is “I parsed it and it’s wrong”. Why: Verbs and status codes are the contract; clients, proxies, and caches act on them without reading your prose. A /v1 you never break is free; retrofitting versioning onto a live unversioned API is not. Escape hatch: A genuinely non-CRUD action (POST /v1/orders/{id}/refund) may be a POST to a sub-resource verb, that is the documented exception, not licence to verb everything.

Error handling

Do: Return one consistent error envelope on every failure path, { "error": { "code": "string_slug", "message": "human readable", "request_id": "..." } }, with the matching HTTP status (see REST defaults). Log full detail (stack, SQL, params) server-side keyed to the same correlation/request id, and return that id to the client. Never: Leak stack traces, SQL, exception class names, file paths, or raw ORM errors to clients. Never return a bare string or a different shape per endpoint. Why: A stable, machine-readable code lets clients branch without parsing prose; the request id turns “it broke” into a one-line log lookup. Leaked internals are both a support nightmare and an attacker’s map. Escape hatch: If you want a ratified standard instead of a house envelope, use RFC 9457 Problem Details (application/problem+json, with type/title/status/detail/instance), it obsoletes RFC 7807. Pick one shape and use it everywhere; do not mix.

Idempotency

Do: For unsafe, retryable operations (POST that charges, signs up, or creates), require a client-supplied Idempotency-Key header (a client-generated UUID). On first request, persist key -> (status, response body) in Postgres inside the same transaction as the side effect, with a UNIQUE constraint on the key; on any replay of the same key, return the stored result instead of re-executing. Scope keys per endpoint and per authenticated user, and expire them (24h is typical). Handle the concurrent-replay race: a second in-flight request with the same key must block or return 409, not run in parallel. Never: Assume the network won’t double-deliver. A client that times out WILL retry, and a non-idempotent charge endpoint double-charges. Never store the key only after the side effect succeeds, a crash in between leaves it replayable. Why: Timeouts and retries are normal, not edge cases; the key is the only thing that makes “did my POST land?” answerable safely. Escape hatch: Naturally idempotent verbs (GET, PUT to a known id, DELETE) need no key.

Request validation

Do: Validate the full request, body, path params, and query string, against a schema at the boundary, before any business logic or DB call runs (see Input validation). Reject unknown/extra fields. On failure return 422 with field-level errors: { "error": { "code": "validation_failed", "fields": { "email": "must be a valid email" } } }. Never: Trust the client, reach into req.body.whatever ad hoc deep in a handler, or silently ignore unexpected fields (mass-assignment risk). Why: One boundary check means business logic only ever sees well-formed input; rejecting unknown fields stops clients from quietly setting columns you never exposed. Framework note: the schema library is stack-specific, zod (Express / Hono / Fastify / Next.js), class-validator DTOs + ValidationPipe({ whitelist: true }) (NestJS), Pydantic models (FastAPI). See the framework page for the exact wiring.

Minimal response fields

Do: Return only the fields the client needs. Define an explicit output shape per endpoint (a serializer / DTO / response schema / column select) and map to it; default to excluding everything until you deliberately add it. Never: Serialize a raw ORM entity or SELECT * straight to JSON. That leaks internal columns (password hashes, internal flags, soft-delete timestamps, other rows’ foreign keys) the instant someone adds a column, and bloats payloads. Why: An allowlisted output shape cannot accidentally leak a newly-added sensitive column; a “return the row” handler leaks it the day the column lands. Framework note: Node/Next.js, map to a plain object or a zod .pick() output schema; NestJS, a response DTO + ClassSerializerInterceptor with @Expose/@Exclude; FastAPI, a Pydantic response_model. See the framework page.

Shared types between frontend and API

Do: Make the API the single source of truth for its types and derive the client’s types from it; derive request/response types from the same schema that validates them. Never: Hand-maintain a second copy of the response shape in the frontend, it silently drifts from the server the first time a field changes. Why: One definition turns a breaking API change into a compile error in the frontend, not a runtime surprise in production. Framework note: Strongest in Node/TypeScript, weaker elsewhere. TS monorepo, share a types package, or use tRPC (end-to-end inference, no codegen) when one team owns both ends. Cross-language or public APIs, generate clients from an OpenAPI spec the server emits (FastAPI emits OpenAPI automatically; add a generator for Node). See the framework page.

8. Frontend & rendering

Framework choice

Do: React via Next.js for app-like products (auth, dashboards, real-time). Astro for content-first/mostly-static sites (blog, docs, marketing). Pick one, app-wide. Never: Hand-roll a framework, or mix several in one app. Why: A boring, well-trodden framework gives you hiring, docs, and answered questions; a custom one gives you a maintenance burden nobody else understands. Escape hatch: Vue or Svelte if that is already the team’s stack, known beats optimal. New project with no existing stack: use the defaults above.

Rendering

Do: Match the rendering mode to the page, not the app. Static/SSG for content; SSR only where SEO or first paint actually matters; client-render only the genuinely interactive islands. Never: Ship a heavy client-side SPA for what is really a content site, or SSR a logged-in dashboard that no crawler will ever see. Why: Sending a megabyte of JS to render an article tanks load time and SEO for no benefit; SSR-ing a private dashboard burns server cost for nothing.

State management

Do: Framework-built-in state (component/context) for UI state. TanStack Query for all server data, caching, refetching, mutations, invalidation. That covers ~90% of real apps. Never: Reach for Redux-style global state until local + server state genuinely cannot cope. Don’t store fetched server data in a global store and hand-wire its cache. Why: Most “global state” is really server cache; a query library handles staleness and refetch with far less code than a hand-rolled store. Escape hatch: A small global store (Zustand for one shared blob, Jotai for many independent atoms) for truly cross-cutting client state, theme, auth session, a complex editor. Still not Redux ceremony.

Optimistic UI / perceived speed

Do: Update the UI instantly on user action and reconcile with the server in the background. Show skeletons/placeholders, not blank screens or spinners. On server rejection, roll back the optimistic change visibly and surface the error. Never: Block the UI behind a spinner for an action that almost always succeeds, or swallow a failed reconcile so the user believes a write landed when it didn’t. Why: Perceived speed is a feature. Instant feedback is the difference between an app that feels alive and one that feels broken, but a silent failed write is worse than a slow one.

Deep frontend (component architecture, design systems, animation, accessibility internals) is out of scope here, see The boundary.

9. UI, forms & UX

Validation

Do: Validate on the client for instant feedback and re-validate everything on the server as the only authority. Share the schema (zod, Valibot) across both so the rules can’t drift. Never: Treat a client-side check as a security or integrity guarantee. The browser is attacker-controlled. Why: Client validation is UX, server validation is correctness. Skip the server half and a curl request walks straight past your form (see security, data). Escape hatch: None. Even a purely internal tool gets server validation.

Inline errors

Do: Put each error next to the field that caused it, say what’s wrong and how to fix it, and tie it to the input with aria-describedby plus a live region so screen readers hear it (see accessibility). Never: Dump a single generic “Invalid input” banner at the top for a fixable field-level problem, or signal an error with red colour alone. Why: A vague banner makes the user hunt for the broken field, and specificity is the whole point of validation.

Validation timing

Do: Validate a field on blur once the user has finished with it, and re-check the whole form on submit. Clear a field’s error the moment it becomes valid. Never: Fire errors on every keystroke while someone is still typing an email or password. Why: Yelling at a half-typed field reads as the form being broken, not the input.

Preserve input

Do: Keep everything the user typed when validation fails, and return focus to the first bad field. On a server round-trip, echo the submitted values back into the form. Never: Clear the form, reset selects, or lose a long text body because one field failed. Why: Making someone retype a working answer because of an unrelated error is the fastest way to lose the submission.

Double-submit

Do: Disable the submit control the instant a mutating request is in flight and re-enable it only when the request settles. Show progress on the button itself so the click clearly registered. Never: Leave a live submit button during the request and rely on the user not clicking twice. Escape hatch: If you can’t disable in time (slow JS, no-JS fallback), the server idempotency key below is your real defence.

Idempotent submits

Do: Pair every create-style submit with an idempotency key generated client-side and honoured server-side, so a retry, refresh, or double-click resolves to one record (see api). Never: Assume a disabled button is enough. Network retries, the back-then-forward dance, and impatient reloads all bypass the UI. Why: Disabling the button prevents the common case, idempotency prevents the duplicate charge.

Submission feedback

Do: Confirm success explicitly with a toast, a redirect, or a visibly updated view, and on failure keep the data, explain what happened, and offer a retry. Never: Return the user to a quiet, unchanged screen on success, or swallow a rejected request silently. Why: Silence after a submit reads as failure, so people resubmit. A dead error reads as a dead app.

The four states

Do: Design empty, loading, error, and success for every async view and treat them as first-class work, not afterthoughts. A view that fetches has all four. Never: Ship a blank screen while loading or a spinner that can spin forever with no timeout and no error path. Why: “Happy path only” is the single most common UI defect, and the other three states are where users actually live.

Empty & error states

Do: Make empty states explain what goes here and offer the action that fills it. Give error states a concrete way out: a retry, a back link, a support route, never a dead end. Never: Render a bare “No data” or an unhandled stack trace. Escape hatch: A truly empty list inside a richer view can be a one-line hint, not a full illustrated zero-state.

No layout shift

Do: Reserve space for incoming content with skeletons that match its shape, so the page doesn’t jump as data arrives. This is also what protects your CLS (the Core Web Vitals “good” target is 0.1 or below at the 75th percentile) (see performance). Never: Render at zero height and let content shove everything down, and don’t swap a spinner for content of a different size.

Input types & keyboards

Do: Use the correct input type so mobile gets the right keyboard and the browser gets free validation: email, tel, url, number, date, search. Set inputmode where the type alone is wrong (a numeric PIN that isn’t a number). Never: Use a plain text box for an email or a numeric code, or a number input for things like phone numbers and card numbers that aren’t quantities.

Labels & autocomplete

Do: Give every input a real <label for>, and set the right autocomplete token (email, given-name, street-address, one-time-code, current-password, new-password) so browsers and password managers fill it. Never: Use placeholder text as a label. It vanishes on focus, fails contrast, and is invisible to assistive tech (see accessibility). Why: Correct autocomplete tokens are the difference between a one-tap checkout and a manual retype.

Defaults & forgiving formats

Do: Prefill what you already know, pick sensible defaults, and accept input in whatever shape people naturally type it: phone numbers and cards with or without spaces, dates with slashes or dashes. Normalise on the server. Never: Reject “+44 7700 900000” because it has spaces, or force a single rigid format the user has to guess. Why: Rejecting a valid value over punctuation is a self-inflicted conversion loss.

Required vs optional

Do: Mark required fields in both text and markup (required/aria-required), and label optional ones explicitly when most are required. Never: Disable the submit button to “enforce” validation. It strands keyboard and screen-reader users with no path forward (see accessibility). Validate and explain on submit instead.

Destructive actions

Do: Make destructive actions deliberate and scale the friction to the consequence: a single click for small reversible deletes, a typed confirmation (“type the project name”) for irreversible bulk ones. Name exactly what will happen (“Delete 3 projects and 412 tasks”). Never: Guard everything with the same vague “Are you sure?” modal, which trains people to click through it on autopilot. Why: Uniform friction is friction users learn to ignore, so it stops protecting the action that matters.

Undo over confirm

Do: Prefer a brief undo window (soft-delete, then a toast with Undo) over a confirmation dialog wherever the action can be reversed. Keep the data recoverable for the undo period. Never: Block routine reversible actions behind a modal when an undo would do. Escape hatch: Genuinely irreversible or out-the-door actions (charge a card, send the email, publish to the public) still get an explicit confirm, because there’s nothing to undo.

Optimistic updates

Do: Apply the change in the UI immediately, fire the request in the background, and on failure roll back to the prior state visibly and tell the user it didn’t stick. Never: Leave an optimistic change on screen after the server rejected it. A silent lie about saved state is worse than a slow spinner. Escape hatch: For high-stakes mutations (payments, anything irreversible) skip optimism and wait for the server before showing success.

10. Accessibility

Target WCAG 2.2 Level AA. That’s what the law references (UK Equality Act, EU Accessibility Act in force since June 2025, US Section 508), and 2.2 AA is a superset of 2.1 AA, so hitting it covers the older versions too. WCAG 3.0 is still a working draft and not law anywhere, so don’t design against it yet. AAA is not the goal for general products.

Semantic HTML first

Do: Reach for the native element. A button is button, a link is a with an href, sections are real landmarks (nav, main, header, footer), and headings descend in order without skipping levels. Never: Build a clickable div with an onclick and bolt on role and tabindex and key handlers to fake what a button does for free. Why: Native elements ship keyboard behaviour, focus, and the right accessibility role already. ARIA adds a role, it adds zero behaviour, so hand-rolled widgets are where keyboard support quietly dies. Use ARIA only when native HTML genuinely can’t express the thing (a tablist, a live region).

Keyboard

Do: Make everything operable with the keyboard alone, keep a visible focus indicator on every interactive element, and ensure focus order follows reading order. Never: Strip outline to nothing for looks, or open a modal that traps focus with no Escape and no way to tab out. Why: No visible focus means keyboard and screen-reader users are navigating blind. If you restyle focus, restyle it brighter, never to invisible. Escape hatch: Hiding focus rings from mouse users only is fine via :focus-visible, which still shows them for keyboard users.

Labels and alt text

Do: Give every input a programmatic label (a label tied by for/id, or aria-label where no visible text exists), and write alt text that conveys the image’s purpose. Never: Rely on placeholder text as the label (it vanishes on input and many readers skip it), or write alt like “image” or “logo123.png”. Decorative images get empty alt="" so readers skip them. Why: An unlabelled field is announced as just “edit text”, which is useless. See the auth and api sections for the form patterns this rides on.

Contrast and motion

Do: Meet WCAG AA contrast: 4.5:1 for normal text, 3:1 for large text and for UI component and graphical boundaries. Convey state with more than colour (icon, text, shape), and honour prefers-reduced-motion by cutting non-essential animation. Never: Signal errors or status with red/green alone (colour-blind users see nothing), or run parallax and auto-playing motion regardless of the user’s OS setting. Why: Roughly 1 in 12 men has a colour vision deficiency, and vestibular disorders make large motion physically painful. Both are settings the browser already hands you.

Deep accessibility (full audits, screen-reader testing across NVDA/VoiceOver/JAWS, complex ARIA widgets like comboboxes and grids) is specialist work. Get the basics above right, then see The boundary.

11. Performance & Core Web Vitals

Three metrics, all judged at the 75th percentile of real users: LCP (Largest Contentful Paint) good under 2.5s, INP (Interaction to Next Paint) good under 200ms, CLS (Cumulative Layout Shift) good under 0.1. A page passes only when all three are good at p75. FID is gone, replaced by INP, and INP is the one most sites still fail, so spend your effort there.

Measure

Do: Treat field data (real users, via the Chrome UX Report or a RUM tool) as the source of truth, and use lab tools like Lighthouse only to debug and reproduce. Set a performance budget and fail CI when a build regresses it. Never: Ship a green Lighthouse score and call it done, or report a “fast” average. Why: Lab runs use one device on one network and miss the slow phones and flaky connections that drag your p75 down. Averages hide the tail: a 200ms mean INP can still mean a quarter of your users are over 500ms. Escape hatch: Pre-launch with no traffic you have no field data, so lab numbers plus a synthetic throttled run are all you have. Switch to field data the moment real users arrive.

Watch the right number

Do: Track p75 as the pass/fail line and watch p95 to find who you’re hurting. Segment by device and country, because mobile and slow networks are where you fail. Never: Optimise the median and assume the rest follows. Why: The slow tail is real users on real hardware, and Google grades you at p75 regardless of how good your median looks.

JavaScript

Do: Ship less JavaScript. Code-split by route, defer or lazy-load anything not needed for the first interaction, and break up long tasks so the main thread can respond. INP is dominated by main-thread work, so cutting JS is the most direct fix. Never: Hydrate static content, ship a full client-side framework for a brochure page, or pull a heavyweight date or utility library when a few lines do. Why: Every kilobyte of JS is parsed, compiled and executed on the main thread, and hydrating markup the user can already see buys you nothing but blocked interactions and a worse INP. Escape hatch: Genuinely interactive surfaces (editors, dashboards, maps) need their JS. Send it, but split it so the rest of the page doesn’t wait, and keep the interactive island small.

Images & media

Do: Serve right-sized images in a modern format (AVIF or WebP), set explicit width and height (or aspect-ratio) on every image and embed, lazy-load everything offscreen, and eager-load only the LCP image. See the images and media section for format and responsive-source detail. Never: Lazy-load the LCP image, ship a 4000px hero to a phone, or omit dimensions. Why: A missing dimension is the classic CLS bug: the box has no reserved space, so content jumps when the asset arrives. Lazy-loading the LCP image is the classic LCP bug: you’ve told the browser to defer the one thing it’s being timed on. Escape hatch: None for dimensions. For the LCP image, if it’s a background or CSS image, preload it explicitly so the browser discovers it early.

Loading

Do: Preload the few resources critical to the first paint (the LCP image, a key font), server-render or stream the above-the-fold content so users see it without waiting for JS, and serve static assets from a CDN with long cache lifetimes and content-hashed filenames. Cross-reference api for payload shape and observability for tracing slow responses. Never: Preload everything (it just creates contention and delays the thing that matters), or block first paint on a client-side data fetch that could have rendered on the server. Why: Preload is a priority signal, not a “load faster” button. Preload more than a handful of resources and you’ve told the browser nothing is important. Content-hashed filenames let you cache static assets effectively forever and bust the cache by changing the name.

Backend

Do: Make the server fast before you make it cached. Index the queries behind your slow endpoints and kill N+1 query patterns first. See data for indexing and query shape, scaling for when caching and read replicas actually earn their keep. Never: Reach for Redis to paper over a query that’s slow because it’s missing an index or firing once per row. Why: A cache in front of a broken query hides the problem until a cache miss, a stampede or a new code path exposes it, and now you’re debugging two systems instead of fixing one query. TTFB feeds straight into LCP: a slow backend caps how fast your page can ever be. Escape hatch: Expensive aggregations and genuinely hot read paths are worth caching once the underlying query is already correct and indexed.

12. SEO & metadata

Crawlability

Do: Server-render the content you want ranked so it’s in the initial HTML response, and return real status codes (404 for gone, 301 for moved). Never: Hide primary content behind client JS, tabs, accordions, or “load more”, or ship a 200 “soft 404”. Why: Crawlers index what’s in the HTML they fetch; content that only appears after hydration or a click may never be seen.

Metadata

Do: Give every page a unique title (lead with the subject, not the brand) and a unique meta description, and set a canonical URL to collapse query-param, trailing-slash, and protocol duplicates. Never: Reuse one boilerplate title or description site-wide, or chase keyword density. The keywords meta tag is dead.

Structure

Do: Use one h1 stating what the page is, nest h2/h3 by meaning, and write link text that says where it goes. Never: Pick heading levels for font size (style with CSS), or use “click here” / “read more” as link text.

Do: Add Open Graph and X (Twitter) card tags with a correctly sized share image, publish a sitemap.xml of canonical indexable URLs referenced from robots.txt, and add Schema.org JSON-LD where it fits (articles, products, breadcrumbs). Never: Disallow your whole site by accident in robots.txt, use robots.txt to hide secrets (see security), or mark up structured data for content that isn’t actually on the page. Escape hatch: Skip structured data entirely if no schema cleanly matches the page; wrong or invented markup is worse than none.

Fundamentals

Do: Treat speed and mobile as ranking factors (indexing is mobile-first, see performance), serve over HTTPS, and use clean lowercase hyphenated URLs that stay permanent. Never: Bury text in images, or change URLs without a 301 redirect. Why: Core Web Vitals and a responsive layout feed ranking directly; broken URLs leak the link equity you already earned.

13. Images & media

Format & size

Do: Serve AVIF with a WebP fallback, sized to the dimensions the image actually renders at (with a 2x variant for high-DPI). Ship srcset so each viewport gets the right file. Never: Drop a multi-MB camera or designer original into an <img> and let the browser scale it down. Why: A 4000px JPEG squeezed into a 400px slot wastes bandwidth and trashes your LCP (see performance). The pixels you don’t show still cost the user.

Layout stability

Do: Give every image an explicit width and height (or aspect-ratio in CSS) so the browser reserves the box before the bytes arrive. Never: Leave images unsized and let content jump when they load. Why: Unsized media is the classic cause of Cumulative Layout Shift. Keep CLS under 0.1.

Loading

Do: Lazy-load offscreen images (loading="lazy") and eager-load only the LCP image, with fetchpriority="high" and a preload so it lands first. Never: Lazy-load the hero or above-the-fold image. That delays the very paint Core Web Vitals measures. Escape hatch: For a carousel, eager-load the first slide and lazy-load the rest.

Pipeline

Do: Put images through a CDN or optimizer (Cloudinary, imgix, or your framework’s image component) that handles format negotiation, resizing and caching from the source asset. Never: Hand-export a wall of fixed-size files in Photoshop and commit them. They go stale and never cover every device. Why: On-the-fly transforms mean one source of truth and the right variant for every request.

User uploads

Do: Store uploads in object storage (S3 or equivalent), let the client PUT directly via a short-lived signed URL, and run resizing, transcoding and virus scanning asynchronously off the request path. See common features and security. Never: Stream uploads through your app server into a database or local disk, or trust the client-supplied filename, content type or dimensions. Why: Big synchronous uploads tie up workers and a forged content type is a real attack vector. Validate server-side and serve user media from a separate origin so a malicious file can’t run in your app’s context.

14. Internationalisation

Strings

Do: Externalise every user-facing string into message catalogues, keyed by meaning, and pass variables into a single message; use ICU message format for plurals and gender. Never: Concatenate translated fragments or stitch “(s)” onto a word to fake a plural. Why: Word order, agreement, and plural rules differ per language; a sentence assembled from pieces is grammatical only in English. ICU lets the translator own the structure. Escape hatch: A missing translation falls back to the default locale, never a crash or a raw key.

Formatting

Do: Format dates, numbers, and currency per locale at the display edge using the platform’s locale APIs; store and compute in UTC and minor units (see Data). Never: Hardcode “$”, assume comma-vs-period separators, or persist a wall-clock local time. Why: Separators, currency placement, and month order vary by locale, and a stored local time is ambiguous the moment a timezone or DST boundary moves. Escape hatch: Show times in the user’s timezone and label them; parse user-entered numbers and dates locale-aware too, not just output.

Routing

Do: Put the locale in the URL (path prefix or subdomain), emit hreflang for every alternate plus an x-default, and define a fallback chain from regional to base language to default. Never: Auto-redirect by IP geolocation alone. Why: A linkable, cacheable locale URL is what crawlers and shared links depend on; IP redirects trap travellers and break those links (see SEO). Escape hatch: Detect from saved preference then Accept-Language, but let the user override and persist their explicit choice.

Layout and assumptions

Do: Support RTL with logical (start/end) CSS properties, design for text expansion, and keep name, address, and phone fields flexible per country; store phone numbers as E.164 and use Unicode end to end. Never: Size containers around English copy, force a first/last name split, or assume a fixed address shape or Latin script. Why: Translations run longer than English and clip; rigid name and address forms lock out most of the world’s users. Escape hatch: None worth taking; keep text out of images so it can translate and reflow.

Deep localisation (translation workflow, locale-specific UX and content strategy) is specialist work, see The boundary.

15. Common features

Sending email

Global (every pathway):

Do: Send through a provider (Amazon SES, Postmark, Resend, SendGrid). Configure SPF, DKIM, and DMARC on the sending domain. Send from a background job, not the request cycle, and make sends idempotent (dedupe on a message key so a retry does not double-send). Never: Run your own SMTP server. Block the HTTP request waiting on the provider’s API. Why: Self-hosted SMTP lands in spam, IP reputation and the SPF/DKIM/DMARC trifecta are the whole game; transactional providers solve it for you. Escape hatch: Postmark for transactional (best inbox placement, but ~10x SES at scale); SES when you are already deep in AWS and want the lowest cost; Resend if you want React-based templates and a modern API and don’t need extreme scale.

Pathway: AWS

Do: Use SES, cheapest at scale and native on AWS; verify the domain and set SPF/DKIM/DMARC; request production access to leave the sandbox before launch. Never: Forget the sandbox, a new SES account can only send to verified addresses until you request production access.

Pathway: Managed (Railway / Render / Fly)

Do: Use Resend or Postmark, simplest API and onboarding; they walk you through SPF/DKIM/DMARC. Store the API key as a platform secret. Never: Send mail over SMTP straight from the platform’s IPs.

Pathway: Non-dev (just get it live)

Agent: Integrate Resend or Postmark by API key (stored as a secret); send from a background job. Tell the user: “Sign up at resend.com or postmarkapp.com, add your domain (they show the exact DNS records to paste so mail doesn’t go to spam), and give the agent the API key. Don’t try to send email from your own server, it lands in spam.”

File storage

Global (every pathway):

Do: Use object storage (S3, Cloudflare R2, GCS). Keep buckets private by default. Upload and download directly between client and bucket using time-limited pre-signed URLs, so bytes never stream through your app server. Never: Write user uploads to the app server’s local disk. Why: Local disk is ephemeral (gone on redeploy/restart) and not shared across instances, so a file written by one replica is invisible to the next request. Proxying bytes through the app wastes its memory and bandwidth. Escape hatch: R2 when egress cost matters (zero egress fees, S3-compatible pre-signed URLs); S3 otherwise.

Pathway: AWS

Do: S3 with private buckets and presigned URLs for up/download; scope the task role to the one bucket; put CloudFront in front for public-read assets. Never: Make the bucket public to “make it work,” or grant s3:* on *.

Pathway: Managed (Railway / Render / Fly)

Do: Use Cloudflare R2 or S3 (these platforms have no durable local disk either); private buckets, presigned URLs. R2 if egress cost matters. Never: Write uploads to the container’s disk, it’s wiped on every deploy.

Pathway: Non-dev (just get it live)

Agent: Set up Cloudflare R2 (or S3) with presigned URLs; never the app’s local disk. Tell the user: “Uploaded files must go to a storage service (Cloudflare R2 or Amazon S3), not ‘the server’, files saved on the server disappear every time the app restarts. The agent sets this up.”

Search

Do: Start in Postgres. Use ILIKE for trivial substring matching; use full-text search (a tsvector column with a GIN index) for real search. Only when you measure that Postgres FTS falls short, relevance tuning, faceting, typo tolerance, or scale, reach for a dedicated engine (Typesense or Meilisearch for simple, fast, typo-tolerant app search; Elasticsearch or its fork OpenSearch for heavy aggregations and log-scale needs, pick OpenSearch if you are on AWS). Never: Stand up Elasticsearch on day one for a search box over one table. Why: A dedicated engine is a second datastore to sync, secure, and keep consistent, real operational weight you should not pay before Postgres demonstrably can’t cope.

Webhooks (receiving)

Do: Verify the HMAC signature on every inbound webhook and reject anything unsigned or invalid. Dedupe on the provider’s event id and make the handler idempotent, providers retry and resend duplicates. Acknowledge fast with a 2xx, then do the real work asynchronously (see Cron / scheduled work for the job runner). Never: Trust the payload because it “came from Stripe.” Do slow work before returning the 2xx, the provider times out and retries, multiplying the load. Why: Without signature verification anyone who learns the URL can forge events; without idempotency, normal provider retries cause double-processing. Escape hatch: If the payload contains URLs you then fetch, treat it as SSRF-prone, allowlist destinations and block internal/metadata addresses.

Webhooks (sending)

Do: Sign every outbound payload with HMAC over timestamp + body (and reject stale timestamps on the receiver to stop replay) so receivers can verify it. Include an idempotency key / event id. Retry on failure with exponential backoff plus jitter, cap the attempts, then dead-letter. Use short connect/read timeouts. Never: Retry in a tight loop with no jitter (you stampede a recovering endpoint), or retry forever (you wedge the queue behind one dead consumer). Why: Receivers’ endpoints are flaky and slow; jittered backoff plus a dead-letter cap is the contract that keeps both sides healthy.

Cron / scheduled work

Do: Run scheduled jobs on a managed scheduler, cloud scheduler, platform cron, or a durable job runner. Make every job idempotent and safe to overlap. For “this scheduled job must run on exactly one instance,” elect a single runner with a Postgres advisory lock (pg_try_advisory_lock). For “many workers pull distinct jobs off a queue,” use SELECT ... FOR UPDATE SKIP LOCKED. Never: Hand-roll an in-process setInterval/background timer for scheduled work. Why: An in-process timer dies silently when the instance restarts, and double-fires once you run more than one replica, two instances both wake at midnight and run the job twice.

Real-time / websockets

Do: Reach for real-time only when polling genuinely won’t do. For one-way server-to-client streams (notifications, progress, token streaming), prefer Server-Sent Events, plain HTTP, built-in reconnection, no extra infra. Use websockets only when the client must also push to the server; they are stateful and don’t fit a stateless app tier, so terminate them on a managed service (Pusher, Ably) or a separate horizontally-scalable layer, and never pin a client to one app instance. Never: Hold long-lived websocket connections directly on the same instances that serve HTTP requests, sticky-routed to one box. Why: Stateful connections on the app tier break deploys, autoscaling, and load balancing, every restart drops every socket, and you can’t scale out without sticky routing.

Payments

Do: Use Stripe (or an equivalent PSP). Drive the card flow through the provider’s hosted Checkout (or a fully provider-hosted iframe) so the browser tokenizes the card directly with the provider and your pages never serve the card form; your server only ever sees the token. Reconcile payment state from idempotent, signature-verified webhooks (see Webhooks (receiving)), not from the redirect/success callback. Never: Accept, log, or store a raw card number, CVV, or full PAN anywhere, not even “temporarily.” Treat the success redirect as proof of payment. Why: The one rule, card data never touches your servers. The moment a PAN hits your infrastructure you are in full PCI-DSS scope. Fully hosted Checkout keeps you in the simplest tier (SAQ A); embedded Elements/Stripe.js served from your own page usually lands in SAQ A-EP (more controls, scripts must be inventoried), so prefer hosted Checkout unless a product need forces embedded fields. The redirect can be forged or interrupted; the webhook is the source of truth. Escape hatch: Use embedded Elements when the checkout UX requirement is non-negotiable, accept the SAQ A-EP obligations (script/integrity monitoring) that come with it.

Money and ledgers

Do: For anything with balances, orders, or a ledger (credits, wallets, accounting, trading), enforce: integer minor units (see Money in Data); writes made idempotent with a client-supplied key; an append-only or double-entry ledger rather than balances mutated in place; reconciliation against the external source of truth (the PSP, the bank, the exchange); and an immutable audit trail. Never: Represent money as floats, place a charge or order without an idempotency key, mutate a balance with no audit record, or treat your own database as the source of truth for funds held somewhere else. Why: Ledgers and order systems fail differently from card checkout. The risks are double-spends, lost writes, and un-auditable drift, none of which hosted Checkout covers. Boundary: Exchange and trading-engine internals, regulatory and licensing requirements, and formal accounting standards are out of scope (see The boundary). This entry is the integrity invariants, not a fintech course.

16. Scaling

Caching

Global (every pathway):

Do: Ship with NO cache. When a specific, MEASURED query or endpoint is the bottleneck, fix it in this order: (1) optimise it, add the right index, rewrite the query; (2) add in-process memoisation or HTTP caching (Cache-Control / ETag); (3) reach for a shared cache (Redis/Memcached) only when the cached value must be shared across instances. Never: Add Redis on day one for imagined load. Why: A cache doubles your moving parts and hands you a cache-invalidation and consistency problem you did not have. Most “slow” endpoints are a missing index (see Read replicas, same discipline: index first). Escape hatch: A genuinely shared, expensive-to-compute, read-heavy value across N instances (e.g. a rate limiter or session store) justifies Redis early, but name the value and the metric first.

Before (agent cold):

// docker-compose: app + postgres + redis, on day one
import Redis from "ioredis";
const redis = new Redis(process.env.REDIS_URL);

async function getUser(id: string) {
  const cached = await redis.get(`user:${id}`);
  if (cached) return JSON.parse(cached);
  const user = await db.user.findUnique({ where: { id } });
  await redis.set(`user:${id}`, JSON.stringify(user), "EX", 60);
  return user;
  // now: stale users after edits, an extra service to run,
  // and a cache-invalidation bug waiting on every write path
}

After (this dictionary):

// app + postgres. one query, correctly indexed.
async function getUser(id: string) {
  return db.user.findUnique({ where: { id } }); // PK lookup, already fast
}
// Add caching only when a metric (p95 latency, DB CPU) demands it,
// and only after the index/query is already optimal.

Pathway: AWS

Do: When a metric (not a guess) demands a shared cache, use ElastiCache (Redis/Valkey) or MemoryDB. Index and query-tune first. Never: Provision ElastiCache pre-emptively “because AWS has it.”

Pathway: Managed (Railway / Render / Fly)

Do: Add the platform’s Redis add-on only when a measured shared-cache need appears; use its connection string. Never: Enable the Redis add-on by default, it’s another bill and another moving part.

Pathway: Non-dev (just get it live)

Tell the user: “You almost certainly don’t need a cache (Redis) yet, skip it. If pages get slow later, the first fix is nearly always a database index, not a cache.”

Background jobs / async

Do: Run slow work (email, image/video processing, third-party API calls) inline in the request to start, but put it behind a single function/interface (e.g. enqueueX()) that runs inline now and can be swapped to a real queue later without touching call sites. Never: Build a queue + worker before you need one, and never inline slow work as a raw call you will have to hunt down and rewrite later. Why: The interface is the cheap insurance; the infrastructure is the expensive part you defer. Move to a queue + worker (see Queues) when requests get slow, the work needs retries, or you must absorb spikes.

Queues

Global (every pathway):

Do: Start with a database-backed queue: a jobs table polled by a worker using SELECT ... FOR UPDATE SKIP LOCKED. Add a real queue only when you have a genuine async need, retries with backoff, decoupling producers from consumers, rate-smoothing, or scheduled fan-out. Never: Add a queue before one of those needs is real and observed. Why: A DB-backed queue is a perfectly good first queue, one less piece of infrastructure, and it shares your existing transaction and backup story. Graduate to SQS (managed, durable, built-in DLQ + retry) or a Redis-backed queue (e.g. BullMQ) only when throughput or fan-out outgrows the DB, or you already run that infrastructure. Escape hatch: If you are already on AWS and want zero queue ops, SQS from the start is defensible, but it is still infrastructure, so name the async need first.

Pathway: AWS

Do: When the DB-backed queue is outgrown, use SQS (managed, at-least-once, built-in retries + dead-letter queue) with workers on Fargate; EventBridge Scheduler for scheduled fan-out. Never: Build a bespoke broker on EC2.

Pathway: Managed (Railway / Render / Fly)

Do: Run a separate worker service (the platform supports a worker process) backed by the DB-as-queue, or its Redis add-on (e.g. BullMQ) when you need it. Never: Run a long-lived worker inside the web process, give it its own service.

Pathway: Non-dev (just get it live)

Agent: Use the database-as-queue (a jobs table + SKIP LOCKED worker); don’t add separate queue infrastructure. Tell the user: “No separate ‘queue’ service needed yet, your app’s database does this job.”

Read replicas

Do: Add a read replica late, only when reads measurably saturate the primary AND you have already indexed the slow queries and added caching (see Caching). Never: Add a replica as a default “for scale,” or to mask a missing index. Why: Most apps never need one. Vertical scaling plus good indexes goes a very long way. Escape hatch: Replicas are also legitimate for isolating heavy analytics/reporting from OLTP, but mind replication lag: route reads that must reflect a just-completed write (read-your-own-writes) back to the primary.

Sharding

Do: Don’t. Scale vertically, add good indexes, and at most add one read replica, that covers all but the very largest systems. Never: Shard a 12-month-old app, or design for sharding “in case.” Why: Sharding forfeits cross-shard joins, transactions, and uniqueness, and is a permanent tax on every query. Escape hatch: If you genuinely think you need it, gather more evidence first, exhaust the biggest instance, partition tables within one Postgres, and only then revisit.

Statelessness

Do: Keep the app process stateless so you can run N copies behind a load balancer. Push state to Postgres (sessions/data), object storage (uploads), or the auth provider (tokens). Coordinate via the database, not in-memory. Never: Hold sessions, uploaded files, or locks in process memory tied to one instance, or rely on sticky sessions. Why: In-memory state silently breaks the moment you add the second instance, and horizontal scaling is the whole point of staying stateless. Escape hatch: Caches and ephemeral compute may live in memory (see Caching), provided losing an instance only costs a recompute, never correctness.

17. Observability & ops

Healthchecks & observability

Global (every pathway):

Do: Expose separate liveness and readiness endpoints (e.g. /livez and /readyz) that the orchestrator and load balancer poll. Wire error tracking (Sentry, or GlitchTip self-hosted) and structured logs (see Logging) from the first deploy, and watch the three signals that matter: latency, error rate, saturation. Never: Launch blind and plan to “add monitoring later,” or collapse liveness and readiness into one endpoint. You cannot operate, debug, or roll back what you cannot see. Why: Readiness gates traffic; liveness gates restarts. If liveness checks a dependency, a slow database triggers a restart loop that turns a degradation into an outage. Escape hatch: Readiness may check the database and critical dependencies; keep liveness cheap, in-process, and dependency-free.

Pathway: AWS

Do: Point the ALB/ECS health check at /readyz; ship logs and metrics to CloudWatch; add Sentry for error tracking; use Container Insights for saturation. Never: Point the ALB health check at a / that does heavy work or hits the database.

Pathway: Managed (Railway / Render / Fly)

Do: Use the platform’s built-in health check (point it at /readyz) and log stream; add Sentry for errors. Never: Rely on the platform dashboard alone, add Sentry so you’re told when something breaks.

Pathway: Non-dev (just get it live)

Agent: Expose /readyz, set the platform’s health check to use it, and add Sentry. Tell the user: “Your platform shows logs and uptime in its dashboard. Add Sentry (sentry.io) so you get emailed when something breaks instead of hearing it from a user.”

Logging

Do: Emit structured JSON logs to stdout and let the platform collect them. Attach a correlation/request id to every log line and thread it through the call stack (AsyncLocalStorage in Node, a context object elsewhere) so a single request is traceable end to end. Never: String-concatenate console spew, write logs to local files the container will lose on restart, or log secrets, tokens, passwords, or PII. Why: Without a per-request id you cannot reconstruct what happened to one user among thousands of interleaved log lines.

Backups

Do: Run automated, regular database backups with point-in-time recovery, AND test a real restore on a schedule. Write down your RPO (how much data you can lose) and RTO (how fast you must be back), and confirm your backup cadence and retention actually meet them. Never: Trust a backup you have never restored. An untested backup is not a backup, it is a hope. Why: Backups fail silently (wrong scope, corrupt dump, expired credentials, retention too short for your RPO); the restore drill is the only proof they work.

Monitoring / alerting

Do: Alert on symptoms users actually feel, error rate, latency (p95/p99), availability, queue depth. Make every alert actionable and route it to a specific on-call person. Start with a minimal set and add alerts only after a real incident reveals the gap. Never: Page on causes and internal noise (CPU at 80%, individual log lines) that wake people without telling them what to do. Alert fatigue means the real page gets ignored. Why: An alert nobody can act on is noise; noise trains people to silence the pager.

Operational safety: kill switch and circuit breakers

Do: Give every system that takes real-world action (money movement, sending, trading, deploying, any irreversible external call) two things. First, a one-flip safe mode (stop / close-only / read-only) that halts side effects instantly without a deploy, reachable fast, with documented access. Second, an automatic circuit breaker that trips on anomaly (an error-rate spike, a spend or loss cap breached, a reconciliation mismatch, a latency cliff) and fails safe by halting the risky action rather than barrelling on. Persist the tripped state (write it through to the database) and re-engage it on startup; a restart, crash-loop, or redeploy must never silently clear the stop, and if the state can’t be read on boot, fail safe and start in safe mode. Write down who can pull the manual stop and what trips the automatic one. Test that it actually trips and holds: trip the breaker in a test, restart the process, and confirm it’s still tripped, the same way you test a restore and not just the backup. Never: Ship a side-effecting system whose only off switch is a code deploy or tearing down infrastructure. Never let a breaker fail open, that is, keep acting when the safety signal itself is broken. Never hold the stop only in memory; a process that comes back up with side effects silently re-enabled is the protection undoing itself at the worst possible moment. Why: When something goes wrong in a system that acts on the world, the first need is to stop the bleeding instantly, before you understand why. This is the thing you reach for at 3am, and it matters most for autonomous, tool-calling agents (see Building AI features): autonomy without a stop is how a bug becomes a disaster. A kill switch exists for when things are going wrong, which is exactly when processes restart and deploys happen, so if the tripped state doesn’t survive a restart, a crash or a deploy becomes the thing that re-arms the danger. Escape hatch: A purely read-only system needs monitoring, not a kill switch. The moment it can act, it needs the stop. (Trading-engine breaker internals and financial circuit-breaker regulation are specialist, see The boundary.)

Before (agent cold): a bad signal fires; the on-call scrambles to write and deploy a hotfix while the damage compounds. After (this dictionary): flip safe mode in one action, the system stops acting, then diagnose calmly.

18. Deployment & CI/CD

What “production-ready” means

Do: Treat “production-ready” as a concrete checklist, not a vibe. Ship only when all of these hold: it deploys repeatably from CI with zero manual steps; secrets are external (see Secrets in deploy); it has a healthcheck, structured logs, error tracking, and a few key metrics with alerts (see Healthchecks & observability, Monitoring / alerting); migrations run safely and reversibly (see Migrations); backups exist AND a restore has been tested (see Backups); and a fast rollback exists (see Rollback plan). Never: Equate “production-ready” with “feature-complete” or “perfect.” It is the operational floor that lets you survive your own mistakes. Why: Every item on this list is something you cannot retrofit calmly at 3am during an incident.

Secrets in deploy

Do: Inject secrets at runtime from the platform secret store (env vars or mounted secret files) so they exist only in the running process’s memory (see Secrets). Never: Bake secrets into the image, commit them to the repo, or pass them as Docker build args, build args are recorded in image history (docker history) and CI logs forever. If a secret is needed at build time, use a BuildKit secret mount (RUN --mount=type=secret), which never lands in a layer. Escape hatch: Build-time public config (API base URLs, feature flags) is fine as a build arg; anything that grants access is a runtime secret.

Env parity

Do: Keep dev, staging, and prod as similar as possible, same database engine and major version, same runtime major version, and use containers to kill “works on my machine.” Never: Use SQLite in dev and Postgres in prod, or run different Postgres major versions across envs. Differences in SQL dialect, types, transactions, collation, and constraints surface only in production. Escape hatch: Staging can run smaller instances and fewer replicas; keep the engines and major versions identical even when sizing differs.

Zero-downtime deploys

Global (every pathway):

Do: Roll out with rolling or blue-green deploys behind a load balancer that uses the readiness check to gate traffic. Pair every rollout with backward-compatible expand-contract migrations so in-flight requests on the old code keep working through the rollout window (see Migrations, Rollback plan). Never: Stop the old version before the new version passes readiness, or apply a breaking schema change mid-rollout while both versions are live. Escape hatch: A brief, announced maintenance window is acceptable for a genuinely unavoidable destructive migration, keep it rare.

Pathway: AWS

Do: ECS rolling deploys (or blue/green via CodeDeploy) behind the ALB, gated on the readiness check; keep migrations expand-contract. Never: Set the ECS minimum-healthy-percent to 0 during a deploy.

Pathway: Managed (Railway / Render / Fly)

Do: Let the platform’s built-in rolling deploy handle it, it starts the new version, waits for the health check, then shifts traffic. Your job is expand-contract migrations. Never: Disable the health check to “speed up” a deploy.

Pathway: Non-dev (just get it live)

Tell the user: “Your platform already deploys with no downtime, it starts the new version, checks it’s healthy, then switches over. You don’t need to do anything except let the agent keep database changes backward-compatible.”

Feature flags and gradual rollout

Do: Gate risky new behaviour behind a flag and roll it out gradually (internal users, then a small percentage, then everyone), with a documented kill path. This is expand-contract for behaviour, the partner to the migration rule in Data. Never: Ship risky logic to 100% of users in one step with no way to turn it off short of a rollback deploy. Why: You already avoid irreversible schema changes; behaviour deserves the same. A flag turns “deploy and pray” into “roll out and watch.”

Rollback plan

Do: Make every deploy revertible in one fast action, with the previous version always one click/command away. Decouple rollback from the database with backward-compatible expand-contract migrations, so old and new code both run against the current schema (see Migrations). Never: Ship a migration that drops or renames a column in the same deploy as the code that depends on it, you can no longer roll back the code without also reverting the schema, and that is the trap that turns a small bug into an outage. Why: Fast rollback is your real safety net; an irreversible schema change quietly removes it.

This is not legal advice. Privacy law varies by jurisdiction and changes; a lawyer signs off on the real thing, not you (see The boundary). What follows is the engineering default that keeps you out of trouble across GDPR, ePrivacy, and US state law.

Minimise

Do: Collect the least data the feature actually needs, decide retention before you write the first row, and default every privacy setting to the more private option. If you can’t name the feature that consumes a field, don’t capture it. Never: Log “everything just in case” or hoover up full request bodies, IPs, and device fingerprints because they might be useful later. Why: Data you never collected can’t leak, can’t be subpoenaed, and isn’t a deletion liability. Minimisation and privacy-by-default are GDPR obligations, not nice-to-haves.

Do: Block non-essential cookies and trackers until the user opts in. Consent must be freely given, specific, informed, and unambiguous, which means no pre-ticked boxes, and “reject all” must be one click on the same layer with the same prominence (size, colour weight, position) as “accept all”. Essential cookies (session, security, load balancing) need no consent, and that category is narrow, so analytics and ads don’t qualify. Never: Drop the analytics or ad pixel on page load and ask for consent after, bury “reject” behind a “manage preferences” layer, or re-prompt the moment someone rejects. Those are dark patterns and EU regulators (the CNIL among them) are actively sweeping for them. Why: A banner that doesn’t actually gate the scripts is theatre; under ePrivacy the tracker firing before consent is the violation, regardless of what the banner says. Escape hatch: Strictly essential cookies and genuinely first-party, aggregate, non-identifying measurement can run without a prompt, but be honest about what “essential” means.

Honour opt-out signals

Do: Detect Global Privacy Control (the Sec-GPC header and navigator.globalPrivacyControl) and treat it as a binding opt-out of selling and sharing. Under California’s CCPA rules (the revised regulations in force since 1 January 2026) and roughly a dozen other US state laws, GPC is a legally valid opt-out on its own, must apply immediately, and California now expects you to surface that you honoured it (an “Opt-Out Request Honored” style acknowledgement). Make opting out as easy as opting in. Never: Ignore GPC because “it’s just a browser setting”, or require the user to also click your banner after their browser already signalled a preference. Why: This has teeth. For example, the Sephora settlement ($1.2M) was for a site that wasn’t even configured to detect GPC, California, Colorado and Connecticut have run coordinated GPC enforcement sweeps, and the Disney settlement ($2.75M, February 2026) was for not fully effectuating opt-outs across devices. Under GDPR the model is different (consent and withdrawal, not sale opt-out), so respect both frameworks rather than assuming one covers the other.

PII

Do: Keep PII out of logs, traces, error reports, and analytics events. Redact at the boundary, encrypt sensitive fields and data at rest and in transit, and write down a retention period and a deletion path for every data class (see Data). Tokenise where you can so the raw value never spreads. Never: Log a full user object, an email, a token, or a card number into your observability stack, or paste production PII into a third-party debugger or LLM. Once it’s in your log pipeline it’s replicated everywhere and you can’t honour a deletion request (see Observability). Why: Logs are the most common accidental PII leak, and they’re usually retained longer and access-controlled less than your actual database.

Analytics

Do: Prefer privacy-respecting, cookieless analytics (Plausible, Fathom, or a self-hosted equivalent) that report aggregate metrics without per-user profiles. If you must use Google Analytics or similar, run it behind consent and configure IP anonymisation and the strictest data-sharing settings. Never: Pipe user identifiers, emails, or raw URLs containing tokens into a third-party analytics or ad platform, and never join behavioural data back to a named user without a lawful basis and consent. Why: Sending PII to a third party is a transfer you’re liable for, and most “free” analytics is free because it monetises that data. Escape hatch: Product analytics that genuinely needs user-level events (funnels, cohorts) is fine, but pseudonymise the identifier, get consent, and keep it in a tool you control.

Rights

Do: Build data export and deletion as first-class features, not manual one-offs. A user (or a regulator on their behalf) can demand a copy of their data or its erasure, so know every store that holds their records, including backups, caches, logs, and third-party processors, and have a defined turnaround. Never: Treat “delete” as a soft deleted_at flag and call it done, or forget the copies sitting in your search index, analytics, and email provider. Why: If you can’t enumerate where a user’s data lives, you can’t actually delete it, and “we lost track of it” is not a defence. Design the deletion fan-out when you design the schema (see Data), not when the first request lands.

20. Testing

What to test

Do: Test the risky and the core: money paths, auth and permissions, data integrity, and anything with real branching logic. Cover the unhappy paths (invalid input, expired tokens, concurrent writes, partial failures), because those are what break in production. Assert on observable behaviour (outputs and side effects), not private internals. Never: Chase 100% coverage, or write a test that can never fail. A test that asserts something always true (or just restates the implementation) tests nothing and rots into noise. Don’t write tests for throwaway spikes or trivial glue. Why: Coverage tells you what’s untested, it isn’t a target; a green bar over the wrong assertions is worse than an honest gap, because it buys false confidence on exactly the code (see security, auth, data) that hurts most when it’s wrong. Escape hatch: Prototype you’ll delete next week? Skip the tests. The moment it’s load-bearing, it earns them.

Levels

Do: Unit-test pure logic (calculations, validation, state machines, parsers) in isolation and lean on these for the bulk of your suite. Run data-layer and query logic as integration tests against a real Postgres (Testcontainers or a disposable instance, not SQLite standing in). Keep a handful of end-to-end tests for flows that must never break (signup, login, checkout) and stop there. Never: Mock away the database in tests that exist to verify data logic. A mocked query layer asserts that your mocks behave like your mocks; it never catches a wrong join, a missing constraint, a cascade, or SQL the real planner rejects. Why: The bugs that reach production live in the seams (your SQL against real constraints, your code against the real schema), and an in-memory substitute lies about all of it. Push detail down to fast unit tests, keep the slow broad tests thin; invert that ratio and the suite gets slow and flaky and people stop trusting it. Escape hatch: Mock at the true edges only (third-party HTTP, payment providers, the clock), never your own core logic.

Discipline

Do: Run the full suite in CI on every PR, with a red suite blocking merge so nothing reaches the main branch unproven (see CI/CD, observability). Keep feedback to minutes: parallelise, and split a slow heavy tier from the fast tier when the wait starts to hurt. For every bug, write a failing test that reproduces it first, watch it fail, then fix it. Never: Fix a bug without a test that would have caught it, and never let the suite drift slow enough that people start skipping or disabling it. A muted test is a deleted test that still burns CI time. Why: The failing-test-first rule proves the bug is real, proves the fix actually fixes it, and guards against the same regression forever; fixing blind proves none of those.

Determinism

Do: Make every test deterministic and self-contained. Inject the clock and seed any randomness so two runs are identical. Set up and tear down each test’s own data, ideally in a transaction that rolls back. Tests must pass in any order and in parallel. Never: Rely on test execution order, share mutable state through globals or module-level singletons, or hit the real network in a unit test. Don’t reuse another test’s leftover rows. Why: Order-dependence and shared state produce the worst kind of failure: the flaky one that passes locally, fails in CI, and trains the team to hit retry until it goes green, at which point the suite stops meaning anything. Escape hatch: A genuinely unavoidable external call (a contract test against a sandbox) belongs in a separate, clearly labelled integration tier, never mixed into the fast unit run.

21. The boundary

What this does not cover

Do: Treat this dictionary as the settled ruling for building a secure, production-ready web app and backend without over-engineering: one app, one Postgres database, a managed auth provider, a monolith, validation at the edges. When a task crosses an edge below, stop applying these rules verbatim and reach for a specialist. Name the edge out loud, “this is data-engineering territory, not covered here”, rather than improvising past it.

Out of scope, go elsewhere:

Deep frontend. Design systems, animation, accessibility depth (beyond semantic HTML and labelled inputs), mobile/native. This dictionary gets you a correct, usable UI; it does not make you a design system.
Payments internals. The ruling here is “use Stripe and store money as integer minor units” (see the money entry). Interchange optimisation, double-entry ledgering, multi-party payouts, PCI scope reduction, specialist territory.
Data engineering, analytics, and ML pipelines. Warehouses, dbt, streaming, feature stores, model training/serving. Your Postgres is an OLTP store, not a lakehouse.
Training and serving models. This dictionary covers calling a model safely (see Building AI features), not training or fine-tuning models, building RAG pipelines, or running model-serving infrastructure.
Fintech and trading internals. Beyond the ledger integrity invariants (see Money and ledgers), exchange and matching-engine design, settlement, regulatory licensing, financial circuit-breaker rules, and formal accounting standards are specialist territory.
Incident response and SRE depth. This dictionary gives you the controls to survive an incident (kill switch, circuit breakers, alerts, rollback). Runbooks, on-call rotation, formal post-mortems, and SLO error-budget process are a discipline of their own.
Infrastructure-as-code depth. A managed platform deploying a container is the default; authoring Terraform module hierarchies or running Kubernetes is not.
Networking and CDN tuning. Put a managed CDN in front of static assets and move on; BGP, anycast, cache-key engineering, and edge compute are out.
Compliance and legal specifics. GDPR, CCPA, HIPAA, SOC 2 obligations, data-residency, retention law, contracts. This dictionary encodes sane security defaults; it is not legal advice and does not certify you against any framework.
Self-hosting on your own VPS or bare metal. This dictionary assumes a managed platform or a cloud provider; it doesn’t yet cover a self-host pathway (systemd, your own process supervision, OS-level hardening), which has a different ops rule set. If you self-host, the data, security, and application rules here still apply, but the deploy and ops specifics won’t, so treat that as out of scope for now.

Never: Stretch a default past its edge just because this dictionary is silent, do not model a data warehouse in your app Postgres, hand-roll a double-entry ledger, or invent an accessibility framework. Silence here means “specialist,” not “improvise.”

Why: The rules are trustworthy precisely because they stop where confident, current knowledge stops. A ruling that claimed to cover everything would be safe to trust on nothing.

Colophon

The Agent’s Dictionary is a static site that serves clean, opinionated markdown for coding agents to read before they build. One markdown source drives both these human pages and the raw files agents fetch. No backend, no database, no client-side rendering.

It follows the llms.txt convention for agent-readable indexes, and the AGENTS.md / SKILL.md conventions for agent instructions.

What dates fastest here

Most of this dictionary is durable, but a few things rot and need a periodic check (the source carries  flags on each):

Platform names, pricing, and free tiers (Railway, Render, Netlify, Cloudflare Pages).
Version-pinned client behaviour (for example, PgBouncer and Prisma connection-flag guidance).
Specific law and standard versions (the WCAG level, the consent regimes named in Privacy).

The capability rulings (“use a managed pooler”, “use object storage”, “use a transactional email provider”) don’t rot. Only the named examples do, so when refreshing, update the flagged lines and leave the rulings alone. Historical facts that were true stay true (uuidv7 landed in Postgres 18; the privacy settlements named in that section); they are illustrations, not a live tally.

Credits

It builds on others’ work:

Matt Pocock and others, for the idea of getting the agent to slow down and think before it writes.
Andrej Karpathy, for the early observations on how AI coding goes wrong.
Jeremy Howard, for the llms.txt convention this site uses.

Built by Chris Northfield. If something’s wrong or mis-credited, tell me and I’ll fix it.