0009 — TDD and stability as primary; the RLS isolation test is the keystone gate

Status: accepted
Date: 2026-05-06
Deciders: Derek

Context

Ark is built by a one-engineer team whose day job is C++ low-level graphics — a discipline-heavy, correctness-paramount, deeply tested practice. The pitch (“one engineer + AI agents at N tenants”) collapses if anything breaks unpredictably.

Combined with multi-tenancy via row-partitioning (ADR 0002), correctness becomes existential: a single missing or wrong RLS policy leaks data across organizations. The mitigation isn’t “be careful” — careful doesn’t scale.

Decision

Test-driven development is the default, not optional. Every package follows: failing test first, implementation second, passing test third. New behavior without a test is a review-blocker.

The RLS isolation test (packages/db/test/rls-isolation.spec.ts) is the keystone gate. It boots a real Postgres (Supabase local), applies all migrations, creates two organizations with members, exercises every CRUD path and every published-content read path, and asserts that no query in either direction can see data from the other org.

CI runs the RLS isolation test on every push. A red gate blocks merge. There is no merge-with-failing-tests escape valve.

New tables and policies extend the RLS test as part of their PR. A migration that adds a table without a corresponding test row is rejected. The migration linter (pnpm migrate:lint) checks structurally; the test verifies behaviorally.

Consequences

Easier:

A single engineer + AI agents can confidently refactor — the tests are the safety net
Multi-tenancy correctness is observable and continuous, not a hopeful claim
Onboarding a new feature follows a predictable path
The test suite is the spec; documentation drift is bounded

Harder:

Velocity is tighter; you can’t merge an idea, you have to merge a verified idea
Setting up the RLS test harness (real Postgres in CI) is upfront cost — paid once, used forever
Mocked tests are not allowed for the data layer (we know from internalize what mocked-DB tests cost)

Trip-wires

We reconsider this stance only if:

A test-first practice provably blocks an urgent fix that has no other path forward (write the post-mortem; the answer is probably “we didn’t have the right test infrastructure,” not “TDD is wrong”)
The RLS isolation test fails in CI more than twice in a quarter (this signals our policy-writing process is too error-prone — see ADR 0002 trip-wires)

Alternatives considered

Test-after development with a test target. Looser; matches what most teams do; doesn’t give the same confidence at our scale of ambition (multi-tenant on shared infra). Not enough.
Property-based testing over example-based. Genuinely better in some places (RLS isolation could use it). We adopt it where it fits, but the default is example-based for clarity.
Mocked DB tests for speed. Rejected. Internalize already documented what mocked-Supabase tests cost. The DB is shared infrastructure; we test against the real thing.