At Tue, 30 Aug 2022 14:50:26 +0900 (JST), Kyotaro Horiguchi <horikyota....@gmail.com> wrote in > IFAIS pg_rewind doesn't. -c option contrarily restores the all > segments after the last (common) checkpoint and all of them are left > alone after pg_rewind finishes. postgres itself removes the WAL files > after recovery. After-promotion cleanup and checkpoint revmoes the > files on the previous timeline. > > Before pg_rewind runs in the repro below, the old primary has the > following segments. > > TLI1: 2 8 9 A B C D > > Just after pg_rewind finishes, the old primary has the following > segments. > > TLI1: 2 3 5 6 7 > TLI2: 4 (and 00000002.history) > > pg_rewind copied 1-2 to 1-3 and 2-4 and history file from the new 1> primary, 1-4 to 1-7 from archive. After rewind finished, 1-4,1-8 to > 1-D have been removed since the new primary didn't have them. > > Recovery starts from 1-3 and promotes at 0/4_000000. postgres removes > 1-5 to 1-7 by post-promotion cleanup and removes 1-2 to 1-4 by a > restartpoint. All of the segments are useless after the old primary > promotes. > > When the old primary starts, it uses 1-3 and 2-4 for recovery and > fails to fetch 2-5 from the new primary. But it is not an issue of > pg_rewind at all.
Ah. I think I understand what you are mentioning. If the new primary didn't have the segment 1-3 to 1-6, pg_rewind removes it. The new primary doesn't have it in pg_wal nor in archive. The old primary has it in its archive. So get out from the situation, we need to the following *two* things before the old primary can start: 1. copy 1-3 to 1-6 from the archive of the *old* primary 2. copy 2-7 and later from the archive of the *new* primary Since pg_rewind have copied in to the old primary's pg_wal, removing them just have users to perform the task duplicatedly, as you stated. Okay, I completely understand the problem and convinced that it is worth changing the behavior. However, the proposed patch looks too complex to me. It can be done by just comparing xlog file name and the last checkpoint location and TLI in decide_file_actions(). regards. ===== # killall -9 postgres # rm -r oldprim newprim oldarch newarch oldprim.log newprim.log mkdir newarch oldarch initdb -k -D oldprim echo "archive_mode = 'on'">> oldprim/postgresql.conf echo "archive_command = 'echo "archive %f" >&2; cp %p `pwd`/oldarch/%f'">> oldprim/postgresql.conf pg_ctl -D oldprim -o '-p 5432' -l oldprim.log start psql -p 5432 -c 'create table t(a int)' pg_basebackup -D newprim -p 5432 echo "primary_conninfo='host=/tmp port=5432'">> newprim/postgresql.conf echo "archive_command = 'echo "archive %f" >&2; cp %p `pwd`/newarch/%f'">> newprim/postgresql.conf touch newprim/standby.signal pg_ctl -D newprim -o '-p 5433' -l newprim.log start # the last common checkpoint psql -p 5432 -c 'checkpoint' # record approx. diverging WAL segment start_wal=`psql -p 5433 -Atc "select pg_walfile_name(pg_last_wal_replay_lsn() - (select setting from pg_settings where name = 'wal_segment_size')::int);"` for i in $(seq 1 5); do psql -p 5432 -c 'insert into t values(0); select pg_switch_wal();'; done psql -p 5432 -c 'checkpoint' pg_ctl -D newprim promote # old rprimary loses diverging WAL segment for i in $(seq 1 4); do psql -p 5432 -c 'insert into t values(0); select pg_switch_wal();'; done psql -p 5432 -c 'checkpoint;' psql -p 5433 -c 'checkpoint;' # old primary cannot archive any more echo "archive_command = 'false'">> oldprim/postgresql.conf pg_ctl -D oldprim reload pg_ctl -D oldprim stop # rewind the old primary, using its own archive # pg_rewind -D oldprim --source-server='port=5433' # should fail echo "restore_command = 'echo "restore %f" >&2; cp `pwd`/oldarch/%f %p'">> oldprim/postgresql.conf pg_rewind -D oldprim --source-server='port=5433' -c # advance WAL on the old primary; new primary loses the launching WAL seg for i in $(seq 1 4); do psql -p 5433 -c 'insert into t values(0); select pg_switch_wal();'; done psql -p 5433 -c 'checkpoint' echo "primary_conninfo='host=/tmp port=5433'">> oldprim/postgresql.conf touch oldprim/standby.signal #### copy the missing file of the old timeline ## cp oldarch/00000001000000000000000[3456] oldprim/pg_wal ## cp newarch/00000002000000000000000* oldprim/pg_wal postgres -D oldprim # fails with "WAL file has been removed" # The alternative of copying-in # echo "restore_command = 'echo "restore %f" >&2; cp `pwd`/newarch/%f %p'">> oldprim/postgresql.conf # copy-in WAL files from new primary's archive to old primary (cd newarch; for f in `ls`; do if [[ "$f" > "$start_wal" ]]; then echo copy $f; cp $f ../oldprim/pg_wal; fi done) postgres -D oldprim ===== -- Kyotaro Horiguchi NTT Open Source Software Center