Re: Fixing WAL instability in various TAP tests

2021-11-03 Thread Mark Dilger
> On Oct 21, 2021, at 3:23 PM, Bossart, Nathan wrote: > > Do we intend to proceed with those, or should we just > close out the Commmitfest entry? I have withdrawn the patch. The issues were intermittent on the buildfarm, and committing other changes along with what Tom already committed wo

Re: Fixing WAL instability in various TAP tests

2021-10-21 Thread Bossart, Nathan
On 9/28/21, 8:17 PM, "Michael Paquier" wrote: > On Tue, Sep 28, 2021 at 03:00:13PM -0400, Tom Lane wrote: >> Should we back-patch 0002? I'm inclined to think so. Should >> we then also back-patch enablement of the bloom test? Less >> sure about that, but I'd lean to doing so. A test that appea

Re: Fixing WAL instability in various TAP tests

2021-09-28 Thread Michael Paquier
On Tue, Sep 28, 2021 at 03:00:13PM -0400, Tom Lane wrote: > Should we back-patch 0002? I'm inclined to think so. Should > we then also back-patch enablement of the bloom test? Less > sure about that, but I'd lean to doing so. A test that appears > to be there but isn't actually invoked is prett

Re: Fixing WAL instability in various TAP tests

2021-09-28 Thread Tom Lane
Mark Dilger writes: > Perhaps having the bloom index messed up answers that, though. I think it > should be easy enough to get the path to the heap main table fork and the > bloom main index fork for both the primary and standby and do a filesystem > comparison as part of the wal test. That w

Re: Fixing WAL instability in various TAP tests

2021-09-28 Thread Mark Dilger
> On Sep 28, 2021, at 11:07 AM, Mark Dilger > wrote: > > Looking closer at the TAP test, it's not ORDERing the result set from the > SELECTs on either node, but it is comparing the sets for stringwise equality, > which is certainly order dependent. Taking the output from the buildfarm page

Re: Fixing WAL instability in various TAP tests

2021-09-28 Thread Tom Lane
Mark Dilger writes: > Looking closer at the TAP test, it's not ORDERing the result set from the > SELECTs on either node, but it is comparing the sets for stringwise equality, > which is certainly order dependent. Well, it's forcing a bitmap scan, so what we're getting is the native ordering of

Re: Fixing WAL instability in various TAP tests

2021-09-28 Thread Tom Lane
I wrote: > So there's more than one symptom, but in any case it seems like > we have an issue in WAL replay. I wonder whether it's bloom's fault > or a core bug. Actually ... I bet it's just the test script's fault. It waits for the standby to catch up like this: my $caughtup_query =

Re: Fixing WAL instability in various TAP tests

2021-09-28 Thread Mark Dilger
> On Sep 28, 2021, at 10:27 AM, Tom Lane wrote: > > I wonder whether it's bloom's fault > or a core bug. Looking closer at the TAP test, it's not ORDERing the result set from the SELECTs on either node, but it is comparing the sets for stringwise equality, which is certainly order dependent

Re: Fixing WAL instability in various TAP tests

2021-09-28 Thread Tom Lane
I wrote: > So that's the same hardware, and identical PG source tree, and different > results. This seems to leave only two theories standing: I forgot theory 3: it's intermittent. Apparently the probability has dropped a lot since 2018, but behold: https://buildfarm.postgresql.org/cgi-bin/show

Re: Fixing WAL instability in various TAP tests

2021-09-28 Thread Andrew Dunstan
On 9/27/21 10:20 PM, Tom Lane wrote: > Michael Paquier writes: >> On Mon, Sep 27, 2021 at 04:19:27PM -0400, Tom Lane wrote: >>> I tried the same thing (i.e., re-enable bloom's TAP test) on my laptop >>> just now, and it passed fine. The laptop is not exactly the same >>> as longfin was in 2018,

Re: Fixing WAL instability in various TAP tests

2021-09-27 Thread Tom Lane
Michael Paquier writes: > On Mon, Sep 27, 2021 at 04:19:27PM -0400, Tom Lane wrote: >> I tried the same thing (i.e., re-enable bloom's TAP test) on my laptop >> just now, and it passed fine. The laptop is not exactly the same >> as longfin was in 2018, but it ought to be close enough. Not sure >

Re: Fixing WAL instability in various TAP tests

2021-09-27 Thread Michael Paquier
On Mon, Sep 27, 2021 at 04:19:27PM -0400, Tom Lane wrote: > I tried the same thing (i.e., re-enable bloom's TAP test) on my laptop > just now, and it passed fine. The laptop is not exactly the same > as longfin was in 2018, but it ought to be close enough. Not sure > what to make of that --- mayb

Re: Fixing WAL instability in various TAP tests

2021-09-27 Thread Tom Lane
Mark Dilger writes: >> On Sep 27, 2021, at 1:19 PM, Tom Lane wrote: >> I'm a little inclined to re-enable the test without your other >> changes, just to see what happens. > That sounds like a good idea. Even if it passes at first, I'd prefer to > leave it for a week or more to have a better s

Re: Fixing WAL instability in various TAP tests

2021-09-27 Thread Mark Dilger
> On Sep 27, 2021, at 1:19 PM, Tom Lane wrote: > > I'm a little inclined to re-enable the test without your other > changes, just to see what happens. That sounds like a good idea. Even if it passes at first, I'd prefer to leave it for a week or more to have a better sense of how stable it

Re: Fixing WAL instability in various TAP tests

2021-09-27 Thread Tom Lane
Mark Dilger writes: > Here is a patch set, one patch per test. The third patch enables its test in > the Makefile, which is commented as having been disabled due to the test > being unstable in the build farm. Re-enabling the test might be wrong, since > the instability might not have been du

Re: Fixing WAL instability in various TAP tests

2021-09-27 Thread Mark Dilger
> On Sep 25, 2021, at 11:04 AM, Mark Dilger > wrote: > > I took Tom's response to be, "yeah, go ahead", and am mostly waiting for the > weekend to be over to see if anybody else has anything to say about it. Here is a patch set, one patch per test. The third patch enables its test in the M

Re: Fixing WAL instability in various TAP tests

2021-09-25 Thread Mark Dilger
> On Sep 25, 2021, at 9:00 AM, Noah Misch wrote: > >> You may be right, but the conversation about "all possible settings" was >> started by Noah. > > You wrote, "I would expect tests which fail under legal alternate GUC settings > to be hardened to explicitly set the GUCs as they need, rathe

Re: Fixing WAL instability in various TAP tests

2021-09-25 Thread Tom Lane
Noah Misch writes: > On Sat, Sep 25, 2021 at 08:20:06AM -0700, Mark Dilger wrote: >> You may be right, but the conversation about "all possible settings" was >> started by Noah. > You wrote, "I would expect tests which fail under legal alternate GUC settings > to be hardened to explicitly set the

Re: Fixing WAL instability in various TAP tests

2021-09-25 Thread Noah Misch
On Sat, Sep 25, 2021 at 08:20:06AM -0700, Mark Dilger wrote: > > On Sep 25, 2021, at 7:17 AM, Tom Lane wrote: > >> Leaving the tests brittle wastes developer time. > > > > Trying to make them proof against all possible settings would waste > > a lot more time, though. > > You may be right, but t

Re: Fixing WAL instability in various TAP tests

2021-09-25 Thread Tom Lane
Mark Dilger writes: > You may be right, but the conversation about "all possible settings" was > started by Noah. I was really just talking about tests that depend on wal > files not being removed, but taking no action to guarantee that, merely > trusting that under default settings they won't

Re: Fixing WAL instability in various TAP tests

2021-09-25 Thread Mark Dilger
> On Sep 25, 2021, at 7:17 AM, Tom Lane wrote: > >> Leaving the tests brittle wastes developer time. > > Trying to make them proof against all possible settings would waste > a lot more time, though. You may be right, but the conversation about "all possible settings" was started by Noah.

Re: Fixing WAL instability in various TAP tests

2021-09-25 Thread Tom Lane
Mark Dilger writes: >> On Sep 24, 2021, at 10:21 PM, Noah Misch wrote: >>> I would >>> expect tests which fail under legal alternate GUC settings to be hardened to >>> explicitly set the GUCs as they need, rather than implicitly relying on the >>> defaults. >> That is not the general practice in

Re: Fixing WAL instability in various TAP tests

2021-09-25 Thread Mark Dilger
> On Sep 24, 2021, at 10:21 PM, Noah Misch wrote: > >> I would >> expect tests which fail under legal alternate GUC settings to be hardened to >> explicitly set the GUCs as they need, rather than implicitly relying on the >> defaults. > > That is not the general practice in PostgreSQL tests t

Re: Fixing WAL instability in various TAP tests

2021-09-24 Thread Noah Misch
On Fri, Sep 24, 2021 at 05:33:13PM -0700, Mark Dilger wrote: > A few TAP tests in the project appear to be sensitive to reductions of the > PostgresNode's max_wal_size setting, resulting in tests failing due to wal > files having been removed too soon. The failures in the logs typically are > of t

Fixing WAL instability in various TAP tests

2021-09-24 Thread Mark Dilger
Hackers, A few TAP tests in the project appear to be sensitive to reductions of the PostgresNode's max_wal_size setting, resulting in tests failing due to wal files having been removed too soon. The failures in the logs typically are of the "requested WAL segment %s has already been removed" v