Found this issue is duplicated to [1], after applied that patch, I cannot reproduce it anymore.
[1] https://www.postgresql.org/message-id/flat/20220316.102444.2193181487576617583.horikyota.ntt%40gmail.com<https://www.postgresql.org/message-id/flat/20220316.102444.2193181487576617583.horikyota....@gmail.com> 2022年3月16日 下午3:16,hao harry <harry-...@outlook.com<mailto:harry-...@outlook.com>> 写道: Hi, pgsql-hackers, I think I found a case that database is not recoverable, would you please give a look? Here is how it happens: - setup primary/standby - do a lots INSERT at primary - create a checkpoint at primary - wait until standby start doing restart point, it take about 3mins syncing buffers to complete - before the restart point update ControlFile, promote the standby, that changed ControlFile ->state to DB_IN_PRODUCTION, this will skip update to ControlFile, leaving the ControlFile ->checkPoint pointing to a removed file - before the promoted standby request the post-recovery checkpoint (fast promoted), one backend crashed, it will kill other server process, so the post-recovery checkpoint skipped - the database restart startup process, which report: "could not locate a valid checkpoint record" I attached a test to reproduce it, it does not fail every time, it fails every 10 times to me. To increase the chance CreateRestartPoint skip update ControlFile and to simulate a crash, the patch 0001 is needed. Best Regard. Harry Hao <0001-Patched-CreateRestartPoint-to-reproduce-invalid-chec.patch><reprod_crash_right_after_promoted.pl>