Hello Kyotaro, On Tue, 30 Aug 2022 at 07:50, Kyotaro Horiguchi <horikyota....@gmail.com> wrote:
> So, if I understand you correctly, the issue you are complaining is > not about the WAL segments on the old timeline but about those on the > new timeline, which don't have a business with what pg_rewind does. As > the same with the case of pg_basebackup, the missing segments need to > be somehow copied from the new primary since the old primary never had > the chance to have them before. > No, we are complaining exactly about WAL segments from the old timeline that are removed by pg_rewind. Those segments haven't been archived by the old primary and the new primary already recycled them. > > Thus I don't follow this.. > I did a slight modification of your script that reproduces a problem. ==== mkdir newarch oldarch initdb -k -D oldprim echo "archive_mode = 'on'">> oldprim/postgresql.conf echo "archive_command = 'echo "archive %f" >&2; cp %p `pwd`/oldarch/%f'">> oldprim/postgresql.conf pg_ctl -D oldprim -o '-p 5432' -l oldprim.log start psql -p 5432 -c 'create table t(a int)' pg_basebackup -D newprim -p 5432 echo "primary_conninfo='host=/tmp port=5432'">> newprim/postgresql.conf echo "archive_command = 'echo "archive %f" >&2; cp %p `pwd`/newarch/%f'">> newprim/postgresql.conf touch newprim/standby.signal pg_ctl -D newprim -o '-p 5433' -l newprim.log start # the last common checkpoint psql -p 5432 -c 'checkpoint' # old primary cannot archive any more echo "archive_command = 'false'">> oldprim/postgresql.conf pg_ctl -D oldprim reload # advance WAL on the old primary; four WAL segments will never make it to the archive for i in $(seq 1 4); do psql -p 5432 -c 'insert into t values(0); select pg_switch_wal();'; done # record approx. diverging WAL segment start_wal=`psql -p 5432 -Atc "select pg_walfile_name(pg_last_wal_replay_lsn() - (select setting from pg_settings where name = 'wal_segment_size')::int);"` pg_ctl -D newprim promote # old rprimary loses diverging WAL segment for i in $(seq 1 4); do psql -p 5432 -c 'insert into t values(0); select pg_switch_wal();'; done psql -p 5432 -c 'checkpoint;' psql -p 5433 -c 'checkpoint;' pg_ctl -D oldprim stop # rewind the old primary, using its own archive # pg_rewind -D oldprim --source-server='port=5433' # should fail echo "restore_command = 'echo "restore %f" >&2; cp `pwd`/oldarch/%f %p'">> oldprim/postgresql.conf pg_rewind -D oldprim --source-server='port=5433' -c # advance WAL on the old primary; new primary loses the launching WAL seg for i in $(seq 1 4); do psql -p 5433 -c 'insert into t values(0); select pg_switch_wal();'; done psql -p 5433 -c 'checkpoint' echo "primary_conninfo='host=/tmp port=5433'">> oldprim/postgresql.conf touch oldprim/standby.signal postgres -D oldprim # fails with "WAL file has been removed" # The alternative of copying-in # echo "restore_command = 'echo "restore %f" >&2; cp `pwd`/newarch/%f %p'">> oldprim/postgresql.conf # copy-in WAL files from new primary's archive to old primary (cd newarch; for f in `ls`; do if [[ "$f" > "$start_wal" ]]; then echo copy $f; cp $f ../oldprim/pg_wal; fi done) postgres -D oldprim # also fails with "requested WAL segment XXX has already been removed" === Regards, -- Alexander Kukushkin