On Tue, May 7, 2013 at 6:57 PM, Heikki Linnakangas <hlinnakan...@vmware.com> wrote: > While testing the bug from the "Assertion failure at standby promotion", I > bumped into a different bug in fast promotion. When the first checkpoint > after fast promotion is performed, there is no guarantee that the > checkpointer process is running with the correct, new, ThisTimeLineID. In > CreateCheckPoint(), we have this: > >> /* >> * An end-of-recovery checkpoint is created before anyone is >> allowed to >> * write WAL. To allow us to write the checkpoint record, >> temporarily >> * enable XLogInsertAllowed. (This also ensures ThisTimeLineID is >> * initialized, which we need here and in AdvanceXLInsertBuffer.) >> */ >> if (flags & CHECKPOINT_END_OF_RECOVERY) >> LocalSetXLogInsertAllowed(); > > > That ensures that ThisTimeLineID is updated when performing an > end-of-recovery checkpoint, but it doesn't get executed with fast promotion. > The consequence is that the checkpoint is created with the old timeline, and > subsequent recovery from it will fail. > > I ran into this with the attached script. It sets up a master (M), a standby > (B), and a cascading standby (C). I'm not sure why, but when I tried to > simplify the script by removing the cascading standby, it started to work. > The bug occurs in standby B, so I'm not sure why the presence of the > cascading standby makes any difference. Maybe it just affects the timing.
Can this really happen? ISTM that the checkpointer should detect that the recovery mode ends and call RecoveryInProgress()->InitXLOGAccess() before calling CreateCheckPoint(). Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers