Postgres Overview
Storage · Transactions · MVCC · Locks · Indexes · WAL
Follow One Ledger Transfer
Consider a transfer of SGD 100 from account A to account B. The application writes the transfer, the debit posting, the credit posting, and an outbox event in one transaction. PostgreSQL must make the whole operation atomic while other requests read balances and attempt transfers concurrently.
This flow connects the major internals. The heap stores the records. Indexes find an account's postings and reject a duplicate idempotency key. MVCC gives readers a consistent view. Locks serialize conflicting changes to the same account. WAL protects the commit even when the modified heap pages have not reached disk yet.
Storage: Heap Pages and Tuples
A table is primarily a heap: an unordered collection of 8 KB pages. Each page contains tuples, which are physical row versions. An index is a separate structure that maps a key to a tuple location in the heap.
The separation matters. Adding an index does not reorder the table. It creates another structure that must be updated and stored. Reads may become much faster, but writes become more expensive.
(account_id, created_at) gives PostgreSQL an ordered path to one account's history.
For physical storage, page layout, table forks, and TOAST, read The Internals of PostgreSQL, Chapter 1.
MVCC: Row Versions Instead of Read Locks
MVCC means Multi-Version Concurrency Control. PostgreSQL usually lets readers and writers proceed concurrently by keeping multiple versions of a row. A snapshot decides which committed versions a statement can see.
An UPDATE does not simply overwrite the old tuple. PostgreSQL creates a new tuple version and marks the old one as superseded. A reader with an older snapshot may still see the old version while a newer reader sees the new one.
The trade-off is cleanup. Old versions eventually become dead tuples. Vacuum marks their space reusable once no active snapshot can still need them. Long transactions can retain old snapshots and prevent that cleanup.
For tuple visibility, transaction IDs, HOT updates, freezing, and wraparound, see the PostgreSQL guides on concurrency control and routine vacuuming.
Locks: Coordination Where Versions Are Not Enough
MVCC reduces read and write blocking, but it does not remove conflicts. Two transactions attempting to update the same row cannot both proceed independently. One must wait, fail, or be retried depending on the operation and isolation level.
Locks also apply to tables and schema changes. Many ALTER TABLE operations need strong locks. A dangerous production pattern is a migration waiting behind a long query while newer queries queue behind the waiting migration.
Practical rule: treat migrations as concurrency events, not file changes. Understand the required lock and set a short lock_timeout so a migration fails instead of silently forming a queue.
Further reading: PostgreSQL explicit locking.
Indexes and the Query Planner
An index is a faster access path, not a guarantee that PostgreSQL will use it. The planner estimates the cost of available strategies and chooses between a sequential scan, an index scan, joins, sorts, and other operations.
For the ledger, two indexes have different correctness and performance jobs:
| Index | Purpose |
|---|---|
(account_id, created_at) | Read one account's posting history efficiently and in useful order. |
UNIQUE (idempotency_key) | Turn duplicate transfer submissions into a database-enforced invariant. |
Indexes cost storage, cache space, and extra work on every insert or update. The right question is not “can this column be indexed?” but “which important query or invariant does this index support, and is that benefit worth its write cost?”
Use EXPLAIN (ANALYZE, BUFFERS) when the observed plan matters. PostgreSQL's index documentation covers index types, multicolumn indexes, partial indexes, and index-only scans.
WAL: Commit the Log Before the Pages
WAL means Write-Ahead Log. PostgreSQL can modify a heap page in memory before writing that page to its table file. The durability rule is that the WAL describing a change must reach durable storage before the corresponding dirty data page does.
After a crash, PostgreSQL replays WAL to restore changes that were committed but not yet present in the table files. WAL is also the source stream for physical replication and logical decoding.
Each WAL position has an LSN, or Log Sequence Number. CDC systems should use source log positions to preserve order and resume safely. Wall-clock timestamps are not a reliable substitute for log order.
For WAL record structure, checkpoints, recovery, and replication, read the official WAL introduction.
Practical Rules
- Keep transactions short. Their locks and snapshots live until the transaction ends.
- Model money movement as balanced, atomic postings rather than direct balance mutations.
- Use database constraints and unique indexes for invariants that must survive concurrent requests.
- Treat every production migration as a locking operation.
- Use query plans and measured buffers to diagnose performance instead of guessing.
- Monitor transaction age, vacuum progress, table growth, and index growth together.
- Use WAL positions rather than timestamps for CDC ordering and restart points.
Continue Learning
Primary sources
- The Internals of PostgreSQL, Chapter 1: Database Cluster, Databases, and Tables
- PostgreSQL documentation: Concurrency Control
- PostgreSQL documentation: Routine Vacuuming
- PostgreSQL documentation: Indexes
- PostgreSQL documentation: Reliability and WAL
Videos
- MVCC Unmasked — Postgres Conference, 51 minutes
- Understanding and Managing Postgres Table Bloat — Citus Con / Microsoft Developer, 25 minutes
- Understanding Locking and Concurrency in PostgreSQL — PGConf India, 42 minutes
- PostgreSQL Indexing: How, Why, and When — PyCon AU, 31 minutes
- Explaining the Postgres Query Optimizer — Bruce Momjian, Postgres Open, 51 minutes
- How PostgreSQL Generates Query Execution Plans — Arpit Bhayani, visual introduction
- How an ALTER TABLE Caused a Massive GitHub Outage — Arpit Bhayani, production case study
- Up Against the WAL — Software Freedom Conservancy, 49 minutes