Re: Standby got invalid primary checkpoint after crashed right after promoted.

Kyotaro Horiguchi Wed, 16 Mar 2022 01:29:05 -0700

At Wed, 16 Mar 2022 07:16:16 +0000, hao harry <[email protected]> wrote in 
> Hi, pgsql-hackers,
> 
> I think I found a case that database is not recoverable, would you please 
> give a look?
> 
> Here is how it happens:
> 
> - setup primary/standby
> - do a lots INSERT at primary
> - create a checkpoint at primary
> - wait until standby start doing restart point, it take about 3mins syncing 
> buffers to complete
> - before the restart point update ControlFile, promote the standby, that 
> changed ControlFile
>   ->state to DB_IN_PRODUCTION, this will skip update to ControlFile, leaving 
> the ControlFile
>   ->checkPoint pointing to a removed file


Yeah, it seems like exactly the same issue pointed in [1].  A fix is
proposed in [1].  Maybe I can remove "possible" from the mail subject:p

[1] 
https://www.postgresql.org/message-id/7bfad665-db9c-0c2a-2604-9f54763c5f9e%40oss.nttdata.com
[2] 
https://www.postgresql.org/message-id/[email protected]

> - before the promoted standby request the post-recovery checkpoint (fast 
> promoted), 
>   one backend crashed, it will kill other server process, so the 
> post-recovery checkpoint skipped
> - the database restart startup process, which report: "could not locate a 
> valid checkpoint record"
> 
> I attached a test to reproduce it, it does not fail every time, it fails 
> every 10 times to me.
> To increase the chance CreateRestartPoint skip update ControlFile and to 
> simulate a crash,
> the patch 0001 is needed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Standby got invalid primary checkpoint after crashed right after promoted.

Reply via email to