Kyotaro's patch seems good to me and fixes the test case in my patch. Do you have interest in adding a test like one in my patch?
> + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); > + > /* > * Remember the prior checkpoint's redo ptr for > * UpdateCheckPointDistanceEstimate() > */ > PriorRedoPtr = ControlFile->checkPointCopy.redo; > > + Assert (PriorRedoPtr < RedoRecPtr);Maybe PriorRedoPtr does not need to be under LWLockAcquire? regards. -- Zhao Rui Alibaba Cloud: https://www.aliyun.com/ ------------------ Original ------------------ From: "Kyotaro Horiguchi" <horikyota....@gmail.com>; Date: Wed, Mar 16, 2022 09:24 AM To: "pgsql-hackers"<pgsql-hackers@lists.postgresql.org>; Cc: "masao.fujii"<masao.fu...@oss.nttdata.com>; Subject: Possible corruption by CreateRestartPoint at promotion Hello, (Cc:ed Fujii-san) This is a diverged topic from [1], which is summarized as $SUBJECT. To recap: While discussing on additional LSNs in checkpoint log message, Fujii-san pointed out [2] that there is a case where CreateRestartPoint leaves unrecoverable database when concurrent promotion happens. That corruption is "fixed" by the next checkpoint so it is not a severe corruption. AFAICS since 9.5, no check(/restart)pionts won't run concurrently with restartpoint [3]. So I propose to remove the code path as attached. regards. [1] https://www.postgresql.org/message-id/20220316.091913.806120467943749797.horikyota.ntt%40gmail.com [2] https://www.postgresql.org/message-id/7bfad665-db9c-0c2a-2604-9f54763c5f9e%40oss.nttdata.com [3] https://www.postgresql.org/message-id/20220222.174401.765586897814316743.horikyota.ntt%40gmail.com -- Kyotaro Horiguchi NTT Open Source Software Center
0001-Test-of-this-problem-v14.patch
Description: Binary data