PostgreSQL Multi-Tenancy: Islands, Apartments, or the Magic RLS Pill?
Audience: SaaS engineers who like their coffee strong, their logs quiet, and their data leaks non‑existent. Stack: PostgreSQL ≥ 14, Go ≥ 1.22, pgx/v5, and a mild fear of prod‑incident Slack pings.
0. Why You Care
You sell one codebase to many customers-cheap for them, lucrative for you. All good until Bob from Tenant‑42 suddenly reads Alice's unpaid invoices. Bob is delighted. Compliance is not. Solution: isolate, isolate, isolate.
1. Multitenancy Tactics in PostgreSQL
Strategy | What It Feels Like | Isolation | DevOps Headache | When It Shines |
---|---|---|---|---|
Database‑per‑Tenant | Airbnb for Databases. Every tenant gets a beachfront. | 🟢 Perfect | 🔴 Migrating 1→N schemas = existential dread | <100 tenants; strict compliance; deep pockets |
Schema‑per‑Tenant | Apartment complex. Same plumbing, separate rooms. | 🟡 Good | 🟠 Vacuuming thousands of schemas | 100‑5 000 tenants; medium compliance; sane wallet |
Row‑Level Security (RLS) | Co‑working space with fingerprint locks on every drawer. | 🟡 Logical isolation | 🟢 One schema, one migration | 5 000‑∞ tenants; cost sensitive; devs who like elegance |
We'll marry RLS. Cheaper than beachfront and tidier than 5 000 schema migrations at 3 a.m.
2. Row‑Level Security 101 - What the Docs Don't Joke About
Definition: RLS piggybacks on PostgreSQL's query rewriter. For each SELECT/UPDATE/DELETE/INSERT
, Postgres injects invisible predicates defined by your POLICY
. Users only touch rows blessed by those predicates.
Analogy: Think Gandalf popping into every query whispering "YOU SHALL NOT PASS… unless your tenant_id matches."
2.1 Internal Flow (abridged):
- Parse - same as usual.
- Rewrite - the RLS engine splices your policy expression into the query tree (think
bool_expr AND <your policy>
). - Plan & Execute - planner may push the predicate past indexes; if
tenant_id
is indexed, you're golden. - Result - rows outside the policy vanish like bad stand‑up jokes.
🔍 Want proof?
EXPLAIN (VERBOSE, COSTS OFF)
shows the extraFilter: (tenant_id = current_setting('app.tenant_id')::uuid)
.
2.2 Policy Types
- USING - gate for
SELECT
,UPDATE
,DELETE
. - WITH CHECK - gate for
INSERT
,UPDATE
NEW rows. Combine both or let Postgres defaultWITH CHECK = USING
when omitted.
3. Schema & Index Blueprint
CREATE TABLE tenants (
id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
CREATE TABLE customers (
id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
tenant_id INT NOT NULL REFERENCES tenants (id) ON DELETE CASCADE,
name TEXT NOT NULL,
email TEXT NOT NULL
);
-- 1. turn the key
ALTER TABLE customers ENABLE ROW LEVEL SECURITY;
-- 2. main policy
CREATE POLICY customers_isolation
ON customers
USING (tenant_id = current_setting('app.tenant_id')::INT)
WITH CHECK (tenant_id = current_setting('app.tenant_id')::INT);
-- 3. performance: composite index for most queries
CREATE INDEX ON customers (tenant_id, id);
Why composite index? Planner can satisfy both tenant filter and PK lookup in one hop.
4. Performance Reality Check
Scenario | Overhead vs. No‑RLS |
---|---|
Cold cache, 1 000 000 rows, index on tenant_id | ≈ 2‑3 % slower (predicate added but index used) |
Sequential scan (no index) | 4‑10 % slower (extra filtering step) |
Complex OR policies | Varies - planner can't pushdown; expect 10 %‑ish |
Takeaways:
- Index the columns used in policies.
- Keep policy expressions sargable (no functions on indexed column).
- Use pg_stat_statements to watch for planners choosing SeqScans-then add indexes or tighten policies.
5. Golang Wiring with pgx
Why pgx
? Pure Go, no C strings attached. So your Docker image won't go hunting for missing libpq.so at 2 a.m. It speaks Postgres natively-binary encodings, COPY, listen/notify, SCRAM, the works. You get a low‑level driver for the days you miss pointer arithmetic and a batteries‑included pool for the sane majority. Fewer moving parts, faster round‑trips, better ergonomics-hard yes.
go get github.com/jackc/pgx/v5@latest
5.1 Tenant‑Aware Pool (BeforeAcquire + AfterRelease)
pgxpool.Config
hands us two trusty hooks. We'll keep every connection squeaky‑clean on the way in and out.
package db
import (
"context"
"github.com/jackc/pgx/v5"
"github.com/jackc/pgx/v5/pgxpool"
)
type contextKey string
const ctxTenantKey contextKey = "ctx:tenant-id"
type Pool struct{ *pgxpool.Pool }
func New(ctx context.Context, url string) (*Pool, error) {
cfg, err := pgxpool.ParseConfig(url)
if err != nil { return nil, err }
// BEFORE ACQUIRE - called right before a connection leaves the pool
// We set the current tenant from context of the request
cfg.BeforeAcquire = func(ctx context.Context, conn *pgx.Conn) bool {
tenantID, ok := ctx.Value(ctxTenantKey).(string)
if !ok || tenantID == "" {
return false // no tenant? reject - forces caller to pass context correctly
}
if _, err := conn.Exec(ctx, `SET app.tenant_id = $1`, tenantID); err != nil {
return false // something went bang - discard connection
}
return true
}
// AFTER RELEASE - called when the app returns a connection to the pool
// Good place to scrub session variables before somebody else grabs it
cfg.AfterRelease = func(conn *pgx.Conn) bool {
// best‑effort cleanup; ignore error to avoid leaking the conn
_, _ = conn.Exec(context.Background(), "RESET app.tenant_id;")
return true // return false would discard the connection entirely
}
p, err := pgxpool.NewWithConfig(ctx, cfg)
return &Pool{p}, nil
}
5.2 Handler Usage - Nothing Fancy
With tenant_id
injected in BeforeAcquire
, your handler can treat the pool like any other database. Just acquire, query, release:
// ctx already carries tenantID courtesy of your middleware
conn, err := pool.Acquire(ctx)
if err != nil {
// handle error
}
defer conn.Release()
rows, err := conn.Query(ctx, `SELECT id, name FROM customers ORDER BY name`)
if err != nil {
// handle error
}
defer rows.Close()
for rows.Next() {
var id uuid.UUID
var name string
if err := rows.Scan(&id, &name); err != nil {
// handle error
}
// ... process row
}
No extra SET
calls, no helper wrappers, zero WHERE tenant_id
clutter - yet every query is still tenant‑scoped. Cleaner diffs, happier reviewers.
6. Gotchas That Keep Slack Awake
- Superusers bypass policies. QA with a normal role.
- Connection poolers (PgBouncer) in TRANSACTION mode drop session vars. Use SESSION mode or set the var on every statement (
SET LOCAL
). - Policies ≠ enabled.
ALTER TABLE ... FORCE ROW LEVEL SECURITY;
ensures even table owners obey. - Bulk admin jobs - create a
maintenance_role
that BYPASSES RLS viaALTER ROLE ... NOINHERIT;
+SECURITY DEFINER
functions.
7. Final Verdict
Separate DBs cost kidneys. Separate schemas breed entropy. RLS, when indexed and monitored, is the "just right" porridge of multitenancy. Ship it, monitor it, sleep.
8. TL;DR Cheat Sheet
- Enable:
ALTER TABLE .. ENABLE ROW LEVEL SECURITY;
- Policy:
CREATE POLICY ... USING (tenant_id = current_setting('app.tenant_id')::uuid);
- Set var:
SET app.tenant_id = 'uuid';
per request. - Index:
(tenant_id, <frequent filter/PK>)
. - Test:
EXPLAIN VERBOSE ...
& non‑superuser.
RLS: Because your SaaS should surprise users with features, not with each other's data.