At Wed, 28 Jun 2023 22:28:13 +0900, torikoshia <torikos...@oss.nttdata.com> wrote in > > On 2022-09-29 17:18, Polina Bungina wrote: > > I agree with your suggestions, so here is the updated version of > > patch. Hope I haven't missed anything. > > Regards, > > Polina Bungina > > Thanks for working on this! > It seems like we are also facing the same issue.
Thanks for looking this. > I tested the v3 patch under our condition, old primary has succeeded > to become new standby. > > > BTW when I used pg_rewind-removes-wal-segments-reproduce.sh attached > in [1], old primary also failed to become standby: > > FATAL: could not receive data from WAL stream: ERROR: requested WAL > segment 000000020000000000000007 has already been removed > > However, I think this is not a problem: just adding restore_command > like below fixed the situation. > > echo "restore_command = '/bin/cp `pwd`/newarch/%f %p'" >> > oldprim/postgresql.conf I thought on the same line at first, but that's not the point here. The problem we want ot address is that pg_rewind ultimately removes certain crucial WAL files required for the new primary to start, despite them being present previously. In other words, that restore_command works, but it only undoes what pg_rewind wrongly did, resulting in unnecessary consupmtion of I/O and/or network bandwidth that essentially serves no purpose. pg_rewind already has a feature that determines how each file should be handled, but it is currently making wrong dicisions for WAL files. The goal here is to rectify this behavior and ensure that pg_rewind makes the right decisions. > Attached modified reproduction script for reference. > > [1]https://www.postgresql.org/message-id/CAFh8B%3DnNiFZOAPsv49gffxHBPzwmZ%3D6Msd4miMis87K%3Dd9rcRA%40mail.gmail.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center