At Fri, 21 May 2021 11:21:05 +0900 (JST), Kyotaro Horiguchi <horikyota....@gmail.com> wrote in > At Thu, 20 May 2021 13:49:10 -0400, Robert Haas <robertmh...@gmail.com> wrote > in > In the case of (c) recoveryTargetTLI > checkpoint TLI. In this case > we expecte that checkpint TLI is in the history of > recoveryTargetTLI. Otherwise recovery failse. This case is similar > to the case (a) but the relationship between recoveryTargetTLI and > the checkpoint TLI is not confirmed yet. ReadRecord barks later if > they are not compatible so there's not a serious problem but might > be better checking the relation ship there. My first proposal > performed mutual check between the two but we need to check only > unidirectionally. > > if (readFile < 0) > { > if (!expectedTLEs) > { > expectedTLEs = readTimeLineHistory(receiveTLI); > + if (!tliOfPointInHistory(receiveTLI, expectedTLEs)) > + ereport(ERROR, "the received timeline %d is not found in the > history file for timeline %d"); > > > > > Conclusion: > > > - I think now we agree on the point that initializing expectedTLEs > > > with the recovery target timeline is the right fix. > > > - We still have some differences of opinion about what was the > > > original problem in the base code which was fixed by the commit > > > (ee994272ca50f70b53074f0febaec97e28f83c4e). > > > > I am also still concerned about whether we understand in exactly what > > cases the current logic doesn't work. We seem to more or less agree on > > the fix, but I don't think we really understand precisely what case we > > are fixing. > > Does the discussion above make sense?
This is a revised version. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
>From 6664808340494dffc775e21cfa1029d6dfb8452e Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota....@gmail.com> Date: Fri, 21 May 2021 16:24:14 +0900 Subject: [PATCH v3] Set expectedTLEs correctly based on recoveryTargetTLI When a standby started streaming before it determines expectedTLEs, it is determined based on the timeline of the WAL segment just streamed from the primary. If the standby fetches the older timeline than recoveryTargetTLI for thestarting checkpoint, this behavior leads to setting expectedTLEs based on the older timeline than recoveryTargetTLI. It is inconsistent with the definition of the variable and prevents later calls to rescanLatestTimeLine from updating recoveryTargetTLI and expectedTLEs correctly. If that happens the standby stops following the upstream. This behavior has been introduced by commit ee994272ca but there seems not a reason for the behavior and looks like a typo. Since the relationship between the two timeline IDs is not verified until then, also add a check for the relationship for safety's sake. --- src/backend/access/transam/xlog.c | 35 ++++++++++---- src/test/recovery/t/004_timeline_switch.pl | 55 +++++++++++++++++++++- 2 files changed, 79 insertions(+), 11 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 441a9124cd..fda729c63f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -12659,22 +12659,37 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, * info is set correctly and XLogReceiptTime isn't * changed. */ - if (readFile < 0) - { - if (!expectedTLEs) - expectedTLEs = readTimeLineHistory(receiveTLI); - readFile = XLogFileRead(readSegNo, PANIC, - receiveTLI, - XLOG_FROM_STREAM, false); - Assert(readFile >= 0); - } - else + if (readFile >= 0) { /* just make sure source info is correct... */ readSource = XLOG_FROM_STREAM; XLogReceiptSource = XLOG_FROM_STREAM; return true; } + + readFile = XLogFileRead(readSegNo, PANIC, + receiveTLI, + XLOG_FROM_STREAM, false); + Assert(readFile >= 0); + + if (expectedTLEs) + break; + + expectedTLEs = readTimeLineHistory(recoveryTargetTLI); + + /* + * We haven't checked the relationship beween the + * receiveTLI and recoveryTargetTLI. Make sure that + * receiveTLI is in the history for + * recoveryTargetTLI. We don't need to do that + * hearafter since recoveryTargetTLI and expectedTLEs + * will be kept in sync. + */ + if (!tliOfPointInHistory(receiveTLI, expectedTLEs)) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("hisotry file of timeline %d is incosistent with the current timeline %d", + recoveryTargetTLI, receiveTLI))); break; } diff --git a/src/test/recovery/t/004_timeline_switch.pl b/src/test/recovery/t/004_timeline_switch.pl index c101980e9e..357f3bf8fa 100644 --- a/src/test/recovery/t/004_timeline_switch.pl +++ b/src/test/recovery/t/004_timeline_switch.pl @@ -90,7 +90,14 @@ $node_primary_2->backup($backup_name); # Create standby node my $node_standby_3 = get_new_node('standby_3'); $node_standby_3->init_from_backup($node_primary_2, $backup_name, - has_streaming => 1); + has_streaming => 1, allows_streaming => 1, has_archiving => 1); +my $archivedir_standby_3 = $node_standby_3->archive_dir; +$node_standby_3->append_conf( + 'postgresql.conf', qq( +archive_mode = always +archive_command = 'cp "%p" "$archivedir_standby_3/%f"' +log_line_prefix = '%m [%p:%b] %q%a ' +)); # Restart primary node in standby mode and promote it, switching it # to a new timeline. @@ -109,3 +116,49 @@ $node_primary_2->wait_for_catchup($node_standby_3, 'replay', my $result_2 = $node_standby_3->safe_psql('postgres', "SELECT count(*) FROM tab_int"); is($result_2, qq(1), 'check content of standby 3'); + + +# Check that server can start from a backup took from a standby that +# contains timeline switch midst of the first segment. + +# setup a standby connects to standby_3 +my $node_standby_4 = get_new_node('standby_4'); +$node_standby_4->init_from_backup($node_primary_2, $backup_name, + has_streaming => 1); +$node_standby_4->start; + +$node_primary_2->psql('postgres', qq[ + CREATE TABLE t (a int); + SELECT pg_switch_wal(); + INSERT INTO t VALUES (0); + CHECKPOINT; +]); +$node_primary_2->wait_for_catchup($node_standby_3, 'write', + $node_primary_2->lsn('insert')); +$node_primary_2->wait_for_catchup($node_standby_4, 'write', + $node_primary_2->lsn('insert')); + +# promote standby_3 then connect standby_4 to standby_3. To enforce +# archive fetching, fluash pg_wal files of the new standby. +$node_standby_4->stop; +$node_standby_3->psql('postgres', 'SELECT pg_promote()'); +$node_standby_4->enable_streaming($node_standby_3); +$node_standby_4->append_conf( + 'postgresql.conf', + qq[restore_command = 'cp "$archivedir_standby_3/%f" "%p"']); + +# clean up pg_wal on the second standby +my $pgwaldir = $node_standby_4->data_dir. "/pg_wal"; +opendir my $dh, $pgwaldir or die "failed to open $pgwaldir"; +while (my $f = readdir($dh)) +{ + unlink("$pgwaldir/$f") if (-f "$pgwaldir/$f"); +} +closedir($dh); + +$node_standby_4->start; + +# check if replication is working +$node_standby_3->psql('postgres', 'INSERT INTO t VALUES(1)'); +$node_standby_3->wait_for_catchup($node_standby_4, 'write', + $node_standby_3->lsn('insert')); -- 2.27.0