Mark Dilger <mark.dil...@enterprisedb.com> writes: > Perhaps having the bloom index messed up answers that, though. I think it > should be easy enough to get the path to the heap main table fork and the > bloom main index fork for both the primary and standby and do a filesystem > comparison as part of the wal test. That would tell us if they differ, and > also if the differences are limited to just one or the other.
I think that's probably overkill, and definitely out-of-scope for contrib/bloom. If we fear that WAL replay is not reproducing the data accurately, we should be testing for that in some more centralized place. Anyway, I confirmed my diagnosis by adding a delay in WAL apply (0001 below); that makes this test fall over spectacularly. And 0002 fixes it. So I propose to push 0002 as soon as the v14 release freeze ends. Should we back-patch 0002? I'm inclined to think so. Should we then also back-patch enablement of the bloom test? Less sure about that, but I'd lean to doing so. A test that appears to be there but isn't actually invoked is pretty misleading. regards, tom lane
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index e51a7a749d..eecbe57aee 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7370,6 +7370,9 @@ StartupXLOG(void) { bool switchedTLI = false; + if (random() < INT_MAX/100) + pg_usleep(100000); + #ifdef WAL_DEBUG if (XLOG_DEBUG || (rmid == RM_XACT_ID && trace_recovery_messages <= DEBUG2) ||
diff --git a/contrib/bloom/t/001_wal.pl b/contrib/bloom/t/001_wal.pl index 55ad35926f..be8916a8eb 100644 --- a/contrib/bloom/t/001_wal.pl +++ b/contrib/bloom/t/001_wal.pl @@ -16,12 +16,10 @@ sub test_index_replay { my ($test_name) = @_; + local $Test::Builder::Level = $Test::Builder::Level + 1; + # Wait for standby to catch up - my $applname = $node_standby->name; - my $caughtup_query = - "SELECT pg_current_wal_lsn() <= write_lsn FROM pg_stat_replication WHERE application_name = '$applname';"; - $node_primary->poll_query_until('postgres', $caughtup_query) - or die "Timed out while waiting for standby 1 to catch up"; + $node_primary->wait_for_catchup($node_standby); my $queries = qq(SET enable_seqscan=off; SET enable_bitmapscan=on;