Re: Intermittent buildfarm failures on wrasse

2022-05-19 Thread Alvaro Herrera
I was looking at backpatching this to pg13. That made me realize that commit dc7420c2c927 changed things in 14; and before that commit, the bitmask that is checked is PROCARRAY_FLAGS_VACUUM, which has a definition independently from whatever proc.h says. As far as I can tell, there's no problem w

Re: Intermittent buildfarm failures on wrasse

2022-05-18 Thread Masahiko Sawada
On Sun, May 15, 2022 at 12:29 AM Alvaro Herrera wrote: > > On 2022-Apr-20, Masahiko Sawada wrote: > > > > MyProc->statusFlags = (MyProc->statusFlags & ~PROC_XMIN_FLAGS) | > > > (proc->statusFlags & PROC_XMIN_FLAGS); > > > > > > Perhaps the latter is more future-proof.

Re: Intermittent buildfarm failures on wrasse

2022-05-14 Thread Alvaro Herrera
On 2022-Apr-20, Masahiko Sawada wrote: > > MyProc->statusFlags = (MyProc->statusFlags & ~PROC_XMIN_FLAGS) | > > (proc->statusFlags & PROC_XMIN_FLAGS); > > > > Perhaps the latter is more future-proof. > Copying only xmin-related flags in this way also makes sense to m

Re: Intermittent buildfarm failures on wrasse

2022-04-19 Thread Masahiko Sawada
On Wed, Apr 20, 2022 at 3:29 AM Tom Lane wrote: > > Alvaro Herrera writes: > > On 2022-Apr-15, Tom Lane wrote: > >> Here's a WIP patch for that. The only exciting thing in it is that > >> because of some undocumented cowboy programming in walsender.c, the > >> Assert((proc->statusFlags & (~PROC_

Re: Intermittent buildfarm failures on wrasse

2022-04-19 Thread Tom Lane
Alvaro Herrera writes: > On 2022-Apr-15, Tom Lane wrote: >> Here's a WIP patch for that. The only exciting thing in it is that >> because of some undocumented cowboy programming in walsender.c, the >> Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0); >> in ProcArrayInstallRestoredXmin fi

Re: Intermittent buildfarm failures on wrasse

2022-04-19 Thread Alvaro Herrera
On 2022-Apr-15, Tom Lane wrote: > Here's a WIP patch for that. The only exciting thing in it is that > because of some undocumented cowboy programming in walsender.c, the > Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0); > in ProcArrayInstallRestoredXmin fires unless we skip that

Re: Intermittent buildfarm failures on wrasse

2022-04-17 Thread Peter Geoghegan
On Sun, Apr 17, 2022 at 7:36 AM Andres Freund wrote: > > Part of the problem here is that we determine VACUUM's FreezeLimit by > > calculating `OldestXmin - vacuum_freeze_min_age` (more or less [1]). > > What the message outputs is OldestXmin and not FreezeLimit though. My higher level point is t

Re: Intermittent buildfarm failures on wrasse

2022-04-17 Thread Andres Freund
Hi, On 2022-04-15 11:12:34 -0700, Peter Geoghegan wrote: > On Fri, Apr 15, 2022 at 10:43 AM Andres Freund wrote: > > I think it'd be interesting - particularly for large relations or when > > looking to adjust autovac cost limits. > > > Something like: > > removable cutoff: %u, age at start: %u,

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Tom Lane
Andres Freund writes: > On April 15, 2022 2:14:47 PM EDT, Tom Lane wrote: >> I could use some help filling in the XXX comments, because it's far >> from clear to me *why* walsenders need this to happen. > If you want to commit before: The reason is that walsenders use their xmin > to represent

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Andres Freund
Hi, On April 15, 2022 2:14:47 PM EDT, Tom Lane wrote: >Andres Freund writes: >> On 2022-04-15 12:36:52 -0400, Tom Lane wrote: >>> Yeah, I was also thinking about a flag in PGPROC being a more reliable >>> way to do this. Is there anything besides walsenders that should set >>> that flag? > >>

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Tom Lane
Andres Freund writes: > On 2022-04-15 12:36:52 -0400, Tom Lane wrote: >> Yeah, I was also thinking about a flag in PGPROC being a more reliable >> way to do this. Is there anything besides walsenders that should set >> that flag? > Not that I can think of. It's only because of hs_feedback that w

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Peter Geoghegan
On Fri, Apr 15, 2022 at 10:43 AM Andres Freund wrote: > I think it'd be interesting - particularly for large relations or when > looking to adjust autovac cost limits. > Something like: > removable cutoff: %u, age at start: %u, age at end: %u... Part of the problem here is that we determine VACU

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Andres Freund
Hi, On 2022-04-15 10:23:56 -0700, Peter Geoghegan wrote: > On Fri, Apr 15, 2022 at 10:15 AM Tom Lane wrote: > > > As well as the age of OldestXmin at the start of VACUUM. > > > > Is it worth capturing and logging both of those numbers? Why is > > the age at the end more interesting than the age

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Peter Geoghegan
On Fri, Apr 15, 2022 at 10:15 AM Tom Lane wrote: > > As well as the age of OldestXmin at the start of VACUUM. > > Is it worth capturing and logging both of those numbers? Why is > the age at the end more interesting than the age at the start? As Andres said, that's often more interesting because

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Tom Lane
Peter Geoghegan writes: > On Fri, Apr 15, 2022 at 10:05 AM Andres Freund wrote: >> The other shows >> the age of OldestXmin at the end of the vacuum. Which is influenced by >> what's currently running. > As well as the age of OldestXmin at the start of VACUUM. Is it worth capturing and logging

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Peter Geoghegan
On Fri, Apr 15, 2022 at 10:05 AM Andres Freund wrote: > I don't think they're actually that comparable. One shows how much > relfrozenxid advanced, to a large degree influenced by the time between > aggressive (or "unintentionally aggressive") vacuums. It matters more in the extreme cases. The mo

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Andres Freund
Hi, On 2022-04-15 09:29:20 -0700, Peter Geoghegan wrote: > On Fri, Apr 15, 2022 at 8:14 AM Tom Lane wrote: > > BTW, before I forget: the wording of this log message is just awful. > > On first sight, I thought that it meant that we'd computed OldestXmin > > a second time and discovered that it ad

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Tom Lane
I wrote: > Um, this is the logical replication launcher, not the autovac launcher. > Your observation that a sleep in get_database_list() reproduces it > confirms that, and I don't entirely see why the timing of the LR launcher > would have changed. Oh, to clarify: I misread "get_database_list()"

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Andres Freund
Hi, (Sent again, somehow my editor started to sometimes screw up mail headers, and ate the From:, sorry for the duplicate) On 2022-04-15 12:36:52 -0400, Tom Lane wrote: > Andres Freund writes: > > On April 15, 2022 11:23:40 AM EDT, Tom Lane wrote: > >> The something is the logical replication l

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Andres Freund
Hi, On 2022-04-15 12:36:52 -0400, Tom Lane wrote: > Andres Freund writes: > > On April 15, 2022 11:23:40 AM EDT, Tom Lane wrote: > >> The something is the logical replication launcher. In the failing runs, > >> it is advertising xmin = 724 (the post-initdb NextXID) and continues to > >> do so w

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Peter Geoghegan
On Fri, Apr 15, 2022 at 9:40 AM Tom Lane wrote: > > Do you think that this juxtaposition works well? > > Seems all right to me; do you have a better suggestion? No. At first I thought that mixing "which is" and "which was" wasn't quite right. I changed my mind, though. Your new wording is fine.

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Tom Lane
Peter Geoghegan writes: > On Fri, Apr 15, 2022 at 8:14 AM Tom Lane wrote: >> BTW, before I forget: the wording of this log message is just awful. >> [ so how about ] >> "removable cutoff: %u, which was %d xids old when operation ended\n" > How the output appears when placed right before the outp

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Tom Lane
Andres Freund writes: > On April 15, 2022 11:23:40 AM EDT, Tom Lane wrote: >> The something is the logical replication launcher. In the failing runs, >> it is advertising xmin = 724 (the post-initdb NextXID) and continues to >> do so well past the point where tenk1 gets vacuumed. > That explain

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Peter Geoghegan
On Fri, Apr 15, 2022 at 8:14 AM Tom Lane wrote: > BTW, before I forget: the wording of this log message is just awful. > On first sight, I thought that it meant that we'd computed OldestXmin > a second time and discovered that it advanced by 26 xids while the VACUUM > was running. > "removable cu

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Andres Freund
Hi, On April 15, 2022 11:23:40 AM EDT, Tom Lane wrote: >I wrote: >> So there's no longer any doubt that something is holding back OldestXmin. >> I will go put some instrumentation into the code that's computing that. > >The something is the logical replication launcher. In the failing runs, >it

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Tom Lane
Andres Freund writes: > On 2022-04-15 10:15:32 -0400, Tom Lane wrote: >> removable cutoff: 724, older by 26 xids when operation ended > The horizon advancing by 26 xids during tenk1's vacuum seems like quite > a bit, given there's no normal concurrent activity during test_setup. Hah, so you were

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Tom Lane
Andres Freund writes: > Off for a bit, but I realized that we likely don't exclude the launcher > because it's not database associated... Yeah. I think this bit in ComputeXidHorizons needs rethinking: /* * Normally queries in other databases are ignored for anything but

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Tom Lane
I wrote: > So there's no longer any doubt that something is holding back OldestXmin. > I will go put some instrumentation into the code that's computing that. The something is the logical replication launcher. In the failing runs, it is advertising xmin = 724 (the post-initdb NextXID) and continu

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Andres Freund
Hi, On April 15, 2022 11:12:10 AM EDT, Andres Freund wrote: >Hi, > >On 2022-04-15 10:15:32 -0400, Tom Lane wrote: >> The morning's first result is that during a failing run, >> the vacuum in test_setup sees >> >> 2022-04-15 16:01:43.064 CEST [4436:75] pg_regress/test_setup LOG: >> statement: V

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Tom Lane
I wrote: > the vacuum in test_setup sees > ... > removable cutoff: 724, older by 26 xids when operation ended > ... BTW, before I forget: the wording of this log message is just awful. On first sight, I thought that it meant that we'd computed OldestXmin a second time and discovered that i

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Andres Freund
Hi, On 2022-04-15 10:15:32 -0400, Tom Lane wrote: > The morning's first result is that during a failing run, > the vacuum in test_setup sees > > 2022-04-15 16:01:43.064 CEST [4436:75] pg_regress/test_setup LOG: statement: > VACUUM ANALYZE tenk1; > 2022-04-15 16:01:43.064 CEST [4436:76] pg_regres

Re: Intermittent buildfarm failures on wrasse

2022-04-15 Thread Tom Lane
The morning's first result is that during a failing run, the vacuum in test_setup sees 2022-04-15 16:01:43.064 CEST [4436:75] pg_regress/test_setup LOG: statement: VACUUM ANALYZE tenk1; 2022-04-15 16:01:43.064 CEST [4436:76] pg_regress/test_setup LOG: vacuuming "re

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Noah Misch
On Thu, Apr 14, 2022 at 10:12:05PM -0700, Andres Freund wrote: > On 2022-04-14 19:45:15 -0700, Noah Misch wrote: > > I suspect the failure is somehow impossible in "check". Yesterday, I > > cranked > > up the number of locales, so there are now a lot more installcheck. Before > > that, each farm

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Andres Freund writes: > On 2022-04-14 23:56:15 -0400, Tom Lane wrote: >> Well, damn. I changed my script that way and it failed on the tenth >> iteration (versus a couple hundred successful iterations the other >> way). > Just to make sure: This is also on wrasse? Right, gcc211 with a moderatel

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Andres Freund
Hi, On 2022-04-14 19:45:15 -0700, Noah Misch wrote: > I suspect the failure is somehow impossible in "check". Yesterday, I cranked > up the number of locales, so there are now a lot more installcheck. Before > that, each farm run had one "check" and two "installcheck". Those days saw > ten inst

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Andres Freund
Hi, On 2022-04-14 23:56:15 -0400, Tom Lane wrote: > I wrote: > > One thing I'm eyeing now is that it looks like Noah is re-initdb'ing > > each time, whereas I'd just stopped and started the postmaster of > > an existing installation. That does not seem like it could matter > > but ... > > Well,

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Noah Misch
On Thu, Apr 14, 2022 at 11:56:15PM -0400, Tom Lane wrote: > Anyway, I'm too tired to do more tonight, but now that I can reproduce it > I will stick some debugging logic in tomorrow. I no longer think we > should clutter the git repo with any more short-term hacks. Sounds good. I've turned off t

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Andres Freund
Hi, On 2022-04-14 22:40:51 -0400, Tom Lane wrote: > Peter Geoghegan writes: > > Suppose that the bug was actually in 06f5295af6, "Add single-item > > cache when looking at topmost XID of a subtrans XID". Doesn't that fit > > your timeline just as well? > > I'd dismissed that on the grounds that

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
I wrote: > One thing I'm eyeing now is that it looks like Noah is re-initdb'ing > each time, whereas I'd just stopped and started the postmaster of > an existing installation. That does not seem like it could matter > but ... Well, damn. I changed my script that way and it failed on the tenth it

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Noah Misch
On Thu, Apr 14, 2022 at 11:06:04PM -0400, Tom Lane wrote: > Noah Misch writes: > > But 24s after that email, it did reproduce the problem. > > Ain't that always the way. Quite so. > > Same symptoms as the > > last buildfarm runs, including visfrac=0. I'm attaching my script. (It has > > vario

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Peter Geoghegan writes: > Well, Noah is running wrasse with 'fsync = off'. And did so in the > script as well. As am I. I duplicated wrasse's config to the extent of cat >>$PGDATA/postgresql.conf <

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 7:54 PM Tom Lane wrote: > This is far from the first time that I've failed to reproduce a buildfarm > result manually, even on the very machine hosting the animal. I would > like to identify the cause(s) of that. One obvious theory is that the > environment under a cron j

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Noah Misch writes: > But 24s after that email, it did reproduce the problem. Ain't that always the way. > Same symptoms as the > last buildfarm runs, including visfrac=0. I'm attaching my script. (It has > various references to my home directory, so it's not self-contained.) That's interestin

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Noah Misch writes: > Like Tom, I'm failing to reproduce this outside the buildfarm client. This is far from the first time that I've failed to reproduce a buildfarm result manually, even on the very machine hosting the animal. I would like to identify the cause(s) of that. One obvious theory is

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Noah Misch
On Thu, Apr 14, 2022 at 07:45:15PM -0700, Noah Misch wrote: > On Thu, Apr 14, 2022 at 06:52:49PM -0700, Andres Freund wrote: > > On 2022-04-14 21:32:27 -0400, Tom Lane wrote: > > > Peter Geoghegan writes: > > > > Are you aware of Andres' commit 02fea8fd? That work prevented exactly > > > > the sam

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Noah Misch
On Thu, Apr 14, 2022 at 06:52:49PM -0700, Andres Freund wrote: > On 2022-04-14 21:32:27 -0400, Tom Lane wrote: > > Peter Geoghegan writes: > > > Are you aware of Andres' commit 02fea8fd? That work prevented exactly > > > the same set of symptoms (the same index-only scan create_index > > > regress

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Peter Geoghegan writes: > Suppose that the bug was actually in 06f5295af6, "Add single-item > cache when looking at topmost XID of a subtrans XID". Doesn't that fit > your timeline just as well? I'd dismissed that on the grounds that there are no subtrans XIDs involved in tenk1's contents. Howev

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 7:20 PM Tom Lane wrote: > Oh! You mean that maybe the OldestXmin horizon was fine, but something > decided not to update hint bits (and therefore also not the all-visible > bit) anyway? Worth investigating I guess. Yes. That is starting to seem like a plausible alternati

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Peter Geoghegan writes: > That was the intent, but that in itself doesn't mean that it isn't > something to do with setting hint bits (not the OldestXmin horizon > being held back). Oh! You mean that maybe the OldestXmin horizon was fine, but something decided not to update hint bits (and theref

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 6:53 PM Tom Lane wrote: > That band-aid only addressed the situation of someone having turned > off synchronous_commit in the first place; which is not the case > on wrasse or most/all other buildfarm animals. Whatever we're > dealing with here is something independent of

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Peter Geoghegan writes: > Anyway, I suppose it's possible that problems reappeared here due to > some other patch. Something else could have broken Andres' earlier > band aid solution (which was to set synchronous_commit=on in > test_setup). That band-aid only addressed the situation of someone h

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Andres Freund
Hi, On 2022-04-14 21:32:27 -0400, Tom Lane wrote: > Peter Geoghegan writes: > > Are you aware of Andres' commit 02fea8fd? That work prevented exactly > > the same set of symptoms (the same index-only scan create_index > > regressions), > > Hm. I'm starting to get the feeling that the real probl

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 6:32 PM Tom Lane wrote: > Hm. I'm starting to get the feeling that the real problem here is > we've "optimized" the system to the point where repeatable results > from VACUUM are impossible :-( I don't think that there is any fundamental reason why VACUUM cannot have repe

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Peter Geoghegan writes: > Are you aware of Andres' commit 02fea8fd? That work prevented exactly > the same set of symptoms (the same index-only scan create_index > regressions), Hm. I'm starting to get the feeling that the real problem here is we've "optimized" the system to the point where repe

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 3:28 PM Peter Geoghegan wrote: > A bunch of autovacuums that ran between "2022-04-14 22:49:16.274" and > "2022-04-14 22:49:19.088" all have the same "removable cutoff". Are you aware of Andres' commit 02fea8fd? That work prevented exactly the same set of symptoms (the same

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 3:23 PM Tom Lane wrote: > If we captured equivalent output from the manual VACUUM in test_setup, > maybe something could be learned. However, it seems virtually certain > to me that the problematic xmin is in some background process > (eg autovac launcher) and thus wouldn'

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Peter Geoghegan writes: > Have you looked at the autovacuum log output in more detail? I don't think there's anything to be learned there. The first autovacuum in wrasse's log happens long after things went south: 2022-04-14 22:49:15.177 CEST [9427:1] LOG: automatic vacuum of table "regressio

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 2:33 PM Tom Lane wrote: > Meanwhile, wrasse did fail with my relallvisible check in place [1], > and what that shows is that relallvisible is *zero* to start with > and remains so throughout the CREATE INDEX sequence. That pretty > definitively proves that it's not a page-

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
I wroteL > Andres Freund writes: >> Thanks! Can you repro the problem manually on wrasse, perhaps even >> outside the buildfarm script? > I'm working on that right now, actually... So far, reproducing it manually has been a miserable failure: I've run about 180 cycles of the core regression test

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 10:07 AM Peter Geoghegan wrote: > It looks like you're changing the elevel convention for these "extra" > messages with this patch. That might be fine, but don't forget about > similar ereports() in vacuumparallel.c. I think that the elevel should > probably remain uniform

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 9:48 AM Andres Freund wrote: > I think it might be worth leaving in, but let's debate that separately? > I'm thinking of something like the attached. The current convention for the "extra" ereport()s that VACUUM VERBOSE outputs at INFO elevel is to use DEBUG2 elevel in all

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Andres Freund
Hi, On 2022-04-14 12:26:20 -0400, Tom Lane wrote: > Andres Freund writes: > > Thanks! Can you repro the problem manually on wrasse, perhaps even > > outside the buildfarm script? Ah, cool. > I'm working on that right now, actually... > > > I wonder if we should make VACUUM log the VERBOSE out

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Andres Freund writes: > Thanks! Can you repro the problem manually on wrasse, perhaps even > outside the buildfarm script? I'm working on that right now, actually... > I wonder if we should make VACUUM log the VERBOSE output at DEBUG1 > unconditionally. This is like the third bug where we needed

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Peter Geoghegan
On Thu, Apr 14, 2022 at 9:18 AM Andres Freund wrote: > I wonder if we should make VACUUM log the VERBOSE output at DEBUG1 > unconditionally. This is like the third bug where we needed that > information, and it's practically impossible to include in regression > output. Then we'd know what the xid

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Andres Freund
Hi, On 2022-04-14 12:01:23 -0400, Tom Lane wrote: > Noah Misch writes: > > On Wed, Apr 13, 2022 at 06:51:12PM -0700, Andres Freund wrote: > >> Noah, any chance you could enable log_autovacuum_min_duration=0 on > >> wrasse? > > > Done. Also forced hourly builds. Thanks! Can you repro the proble

Re: Intermittent buildfarm failures on wrasse

2022-04-14 Thread Tom Lane
Noah Misch writes: > On Wed, Apr 13, 2022 at 06:51:12PM -0700, Andres Freund wrote: >> Noah, any chance you could enable log_autovacuum_min_duration=0 on >> wrasse? > Done. Also forced hourly builds. Thanks! We now have two failing runs with the additional info [1][2], and in both, it's clear

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Noah Misch
On Wed, Apr 13, 2022 at 06:51:12PM -0700, Andres Freund wrote: > Noah, any chance you could enable log_autovacuum_min_duration=0 on > wrasse? Done. Also forced hourly builds.

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Tom Lane
Andres Freund writes: > Noah, any chance you could enable log_autovacuum_min_duration=0 on > wrasse? +1 > Does sparc have wider alignment rules for some types? Perhaps that'd be > enough to put some tables to be sufficiently larger to trigger parallel > vacuum? No, the configure results on wras

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Andres Freund
Hi, Noah, any chance you could enable log_autovacuum_min_duration=0 on wrasse? On 2022-04-13 21:23:12 -0400, Tom Lane wrote: > I'm still suspicious of the pgstat changes, though. I checked into > things here by doing > > initdb > edit postgresql.conf to set log_autovacuum_min_durat

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Tom Lane
Andres Freund writes: > On 2022-04-13 20:35:50 -0400, Tom Lane wrote: >> It seems like a SQL-accessible function could be written >> and then called before any problematic VACUUM. I like this better >> for something we're thinking of jamming in post-feature-freeze; >> we'd not be committing to th

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 6:05 PM Andres Freund wrote: > I think most of those we've ended up replacing by using temp tables in > those tests instead, since they're not affected by the global horizon > anymore. Maybe, but it's a pain to have to work that way. You can't do it in cases like this, bec

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 6:03 PM Peter Geoghegan wrote: > I think that it's more likely that FREEZE will correct problems, out of the > two: > > * FREEZE forces an aggressive VACUUM whose FreezeLimit is as recent a > cutoff value as possible (FreezeLimit will be equal to OldestXmin). The reason w

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Andres Freund
Hi, On 2022-04-13 20:35:50 -0400, Tom Lane wrote: > Peter Geoghegan writes: > > On Wed, Apr 13, 2022 at 4:51 PM Tom Lane wrote: > >> Yeah, we have band-aided around this type of problem repeatedly. > >> Making a fix that's readily accessible from any test script > >> seems like a good idea. > >

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 5:35 PM Tom Lane wrote: > My guess is that you'd need both this new wait-for-horizon behavior > *and* DISABLE_PAGE_SKIPPING. But the two together ought to make > for pretty reproducible behavior. I noticed while scanning the > commit log that some patches have tried addin

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Andres Freund
Hi, On 2022-04-13 18:54:06 -0400, Tom Lane wrote: > We used to have a rate limit on how often stats reports would be sent > to the collector, which'd ensure half a second or so delay before a > transaction's change counts became visible to the autovac daemon. Just for posterity: That's not actual

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Tom Lane
Peter Geoghegan writes: > On Wed, Apr 13, 2022 at 4:51 PM Tom Lane wrote: >> Yeah, we have band-aided around this type of problem repeatedly. >> Making a fix that's readily accessible from any test script >> seems like a good idea. > We might even be able to consistently rely on this new option,

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Andres Freund
Hi, On 2022-04-13 16:45:44 -0700, Peter Geoghegan wrote: > On Wed, Apr 13, 2022 at 4:38 PM Tom Lane wrote: > > So what seems to be happening on wrasse is that a background > > autovacuum (or really autoanalyze?) is preventing pages from > > being marked all-visible not only during test_setup.sql

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 4:51 PM Tom Lane wrote: > Yeah, we have band-aided around this type of problem repeatedly. > Making a fix that's readily accessible from any test script > seems like a good idea. We might even be able to consistently rely on this new option, given *any* problem involving t

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Tom Lane
Peter Geoghegan writes: > On Wed, Apr 13, 2022 at 4:38 PM Tom Lane wrote: >> It seems like a reliable fix might require test_setup to wait >> for any background autovac to exit before it does its own >> vacuums. Ick. > This is hardly a new problem, really. I wonder if it's worth inventing > a c

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 4:38 PM Tom Lane wrote: > So what seems to be happening on wrasse is that a background > autovacuum (or really autoanalyze?) is preventing pages from > being marked all-visible not only during test_setup.sql but > also create_index.sql; but it's gone by the time sanity_chec

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Tom Lane
Peter Geoghegan writes: > On Wed, Apr 13, 2022 at 4:13 PM Andres Freund wrote: >> IIRC the problem in matter isn't skipped pages, but that the horizon simply >> isn't new enough to mark pages as all visible. > Sometimes OldestXmin can go backwards in VACUUM operations that are > run in close su

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 4:13 PM Andres Freund wrote: > IIRC the problem in matter isn't skipped pages, but that the horizon simply > isn't new enough to mark pages as all visible. Sometimes OldestXmin can go backwards in VACUUM operations that are run in close succession against the same table,

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Andres Freund
Hi, On April 13, 2022 7:06:33 PM EDT, David Rowley wrote: >On Thu, 14 Apr 2022 at 10:54, Tom Lane wrote: >> After a bit more navel-contemplation I see a way that the pgstats >> work could have changed timing in this area. We used to have a >> rate limit on how often stats reports would be sent

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 3:54 PM Tom Lane wrote: > After a bit more navel-contemplation I see a way that the pgstats > work could have changed timing in this area. We used to have a > rate limit on how often stats reports would be sent to the > collector, which'd ensure half a second or so delay b

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread David Rowley
On Thu, 14 Apr 2022 at 10:54, Tom Lane wrote: > After a bit more navel-contemplation I see a way that the pgstats > work could have changed timing in this area. We used to have a > rate limit on how often stats reports would be sent to the > collector, which'd ensure half a second or so delay bef

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Tom Lane
Peter Geoghegan writes: > On Wed, Apr 13, 2022 at 3:08 PM Tom Lane wrote: >> I'm tempted to add something like >> SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1'; >> so that we can confirm or refute the theory that relallvisible is >> the driving factor. > It would be fairl

Re: Intermittent buildfarm failures on wrasse

2022-04-13 Thread Peter Geoghegan
On Wed, Apr 13, 2022 at 3:08 PM Tom Lane wrote: > I'm tempted to add something like > > SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1'; > > so that we can confirm or refute the theory that relallvisible is > the driving factor. It would be fairly straightforward to commit a

Intermittent buildfarm failures on wrasse

2022-04-13 Thread Tom Lane
For the past five days or so, wrasse has been intermittently failing due to unexpectedly not using an Index Only Scan plan in the create_index test [1], eg @@ -1910,11 +1910,15 @@ SELECT unique1 FROM tenk1 WHERE unique1 IN (1,42,7) ORDER BY unique1; - QUERY PLAN