On Sat, Jun 4, 2022 at 6:29 PM James Coleman <jtc...@gmail.com> wrote: > > A few weeks back I sent a bug report [1] directly to the -bugs mailing > list, and I haven't seen any activity on it (maybe this is because I > emailed directly instead of using the form?), but I got some time to > take a look and concluded that a first-level fix is pretty simple. > > A quick background refresher: after promoting a standby rewinding the > former primary requires that a checkpoint have been completed on the > new primary after promotion. This is correctly documented. However > pg_rewind incorrectly reports to the user that a rewind isn't > necessary because the source and target are on the same timeline. > > Specifically, this happens when the control file on the newly promoted > server looks like: > > Latest checkpoint's TimeLineID: 4 > Latest checkpoint's PrevTimeLineID: 4 > ... > Min recovery ending loc's timeline: 5 > > Attached is a patch that detects this condition and reports it as an > error to the user. > > In the spirit of the new-ish "ensure shutdown" functionality I could > imagine extending this to automatically issue a checkpoint when this > situation is detected. I haven't started to code that up, however, > wanting to first get buy-in on that. > > 1: > https://www.postgresql.org/message-id/CAAaqYe8b2DBbooTprY4v=bized9qbqvlq+fd9j617eqfjk1...@mail.gmail.com
Thanks. I had a quick look over the issue and patch - just a thought - can't we let pg_rewind issue a checkpoint on the new primary instead of erroring out, maybe optionally? It might sound too much, but helps pg_rewind to be self-reliant i.e. avoiding external actor to detect the error and issue checkpoint the new primary to be able to successfully run pg_rewind on the pld primary and repair it to use it as a new standby. Regards, Bharath Rupireddy.