On 11/27/20, 10:58 AM, "Stephen Frost" <sfr...@snowman.net> wrote: > If you'd like to show that I'm wrong, and it's entirely possible that I > am, then retry the above with actual load on the system, and also > actually look at how much outstanding WAL you end up with given the > different scenarios which has to be replayed during crash recovery.
I did a little experiment to show the behavior I'm referring to. I used these settings: checkpoint_completion_target = 0.9 checkpoint_timeout = 30s max_wal_size = 20GB WAL segment size is 64MB I ran the following pgbench command for a few minutes before each test: pgbench postgres -T 3600 -c 64 -j 64 -N For the first test, I killed Postgres just before an automatic, non- immediate checkpoint completed. 2020-11-28 00:31:57 UTC::@:[51770]:LOG: checkpoint complete... 2020-11-28 00:32:00 UTC::@:[51770]:LOG: checkpoint starting: time Killed Postgres at 00:32:26 UTC, 29 seconds after latest checkpoint completed. 2020-11-28 00:32:42 UTC::@:[77256]:LOG: redo starts at 3CF/FD6B8BD0 2020-11-28 00:32:56 UTC::@:[77256]:LOG: redo done at 3D0/C94D1D00 Recovery took 14 seconds and replayed ~3.2 GB of WAL. postgres=> SELECT pg_wal_lsn_diff('3D0/C94D1D00', '3CF/FD6B8BD0'); pg_wal_lsn_diff ----------------- 3420557616 (1 row) For the second test, I killed Postgres just after an automatic, non- immediate checkpoint completed. 2020-11-28 00:41:26 UTC::@:[77475]:LOG: checkpoint complete... Killed Postgres at 00:41:26 UTC, just after latest checkpoint completed. 2020-11-28 00:41:42 UTC::@:[8599]:LOG: redo starts at 3D3/152EDD78 2020-11-28 00:41:49 UTC::@:[8599]:LOG: redo done at 3D3/78358A40 Recovery took 7 seconds and replayed ~1.5 GB of WAL. postgres=> SELECT pg_wal_lsn_diff('3D3/78358A40', '3D3/152EDD78'); pg_wal_lsn_diff ----------------- 1661381832 (1 row) Granted, I used a rather aggressive checkpoint_timeout, but I think this demonstrates that waiting for a non-immediate checkpoint to complete can lower the amount of WAL needed for recovery, even though it might not lower it as much as waiting for an immediate checkpoint would. Nathan