Understanding PostgreSQL Checkpoints: From WAL to Disk
· 4 min read
PostgreSQL relies on checkpoints to ensure data durability while maintaining performance. Understanding how checkpoints work and their relationship with Write-Ahead Logging is essential for database performance tuning and troubleshooting.
This post builds on fundamental concepts covered in our PostgreSQL internals series:
- Part 1: Memory vs Disk Performance
- Part 2: How Databases Store Data
- Part 3: Transactions and ACID
- Part 4: Performance Patterns
- Part 5: Write-Ahead Logging Deep Dive
- Part 6: Monitoring and Administration
Write-Ahead Logging: The Foundation
Checkpoints work hand-in-hand with Write-Ahead Logging (WAL). When PostgreSQL modifies data, changes are written to WAL files first (sequential, fast) before updating data pages in memory. Modified pages (called “dirty pages”) accumulate in shared_buffers, and eventually these changes need to be written to the actual data files. That’s where checkpoints come in.
What Happens During a Checkpoint
A checkpoint is PostgreSQL’s process of writing all dirty pages from shared buffers to disk. It creates a known recovery point and ensures data durability.
The Checkpoint Process
When a checkpoint occurs:
- Checkpoint starts
- PostgreSQL marks the current WAL position as the checkpoint location
- This position is the recovery starting point if a crash occurs
- Dirty pages are written
- All modified data pages in
shared_buffersare flushed to disk - This happens gradually to avoid I/O spikes (controlled by
checkpoint_completion_target) - Pages are written in order to minimize random I/O
- All modified data pages in
- Checkpoint completes
- A checkpoint record is written to WAL
- The
pg_controlfile is updated with the new checkpoint location - Old WAL files (before the checkpoint) can now be recycled or archived
What Triggers a Checkpoint?
PostgreSQL creates checkpoints based on:
- Time:
checkpoint_timeoutparameter (default: 5 minutes) - WAL volume:
max_wal_sizeparameter (default: 1GB) - Manual trigger:
CHECKPOINTcommand - Shutdown: Always creates a checkpoint during clean shutdown
1
2
-- Force an immediate checkpoint
CHECKPOINT;
Checkpoint Impact on Performance
Checkpoints involve heavy I/O (writing potentially gigabytes of dirty pages), which can cause temporary performance degradation. Understanding performance trade-offs helps you balance durability with speed. PostgreSQL spreads checkpoint writes over time using checkpoint_completion_target (default: 0.9) to minimize I/O spikes.
Monitoring Checkpoint Activity
For detailed monitoring techniques, see Part 6: Monitoring and Administration. Key metrics to track:
Check Checkpoint Statistics
1
2
3
4
5
-- PostgreSQL 17+
SELECT * FROM pg_stat_checkpointer;
-- PostgreSQL 16 and earlier
SELECT * FROM pg_stat_bgwriter;
What to look for:
- High
checkpoints_req(ornum_requestedin v17+) means checkpoints are happening too frequently - Large
checkpoint_write_time(orwrite_timein v17+) indicates heavy I/O load
Monitor WAL Generation Rate
High WAL generation can trigger frequent checkpoints. See Part 6 for detailed WAL monitoring queries and interpretation.
Tuning Checkpoint Behavior
Key parameters to adjust (see Part 4: Performance Patterns for trade-offs):
1
2
3
4
checkpoint_timeout = 15min # Default: 5min
max_wal_size = 4GB # Default: 1GB
checkpoint_completion_target = 0.9 # Default: 0.9
log_checkpoints = on
Guidelines:
- Increase
max_wal_sizeifcheckpoints_reqis high - Increase
checkpoint_timeoutfor write-heavy workloads - Keep
checkpoint_completion_targetat 0.9 to avoid I/O spikes
For broader PostgreSQL performance optimization, see PostgreSQL query optimization guide.
Conclusion
Checkpoints are PostgreSQL’s mechanism for persisting in-memory changes to disk, creating recovery points, and managing WAL files. They balance performance with durability by batching writes and spreading I/O over time. Watch for frequent requested checkpoints and long write times as signals for tuning opportunities.
For deeper understanding, explore the PostgreSQL internals series, or dive into PostgreSQL EXPLAIN ANALYZE for query optimization.