On 2/5/25 05:37, Peter Eisentraut wrote:
On 29.01.25 07:34, Paul Jungwirth wrote:
Is it possible to commit an RI_PLAN_NO_ACTION addition and see if that makes the buildfarm failures go away? Here is a proposed patch for that (v48.1). I would understand if this is too questionable a practice---but it would be nice to get sufficient test exposure to see if it makes a difference. Since I still haven't reproduced this locally (despite running continuously for almost a week), it's not an experiment I can do myself. If it *does* make the failures go away, then it suggests there is still some latent problem somewhere.

I'm tempted to give this a try.  But the cfbot is currently in a bit of a mess, so I'll wait until that is clean again so that we can have a usable baseline to work against.

Okay, thanks! I've been spending some more time on this, but I haven't made 
much progress.

It's surely not as simple as just oid wrapround. Here is a bpftrace script to show when we change TransamVariables->nextOid:

BEGIN {
  @setnext = 0
}

u:/home/paul/local/bin/postgres:GetNewObjectId {
  @newoids[tid] += 1
}

u:/home/paul/local/bin/postgres:SetNextObjectId {
  @setnext += 1
}

When I run this during `make installcheck` I get only 29608 total calls to GetNewObjectId, and none for SetNextObjectId.

I've also been looking at the dynahash code a bit. With gdb I can give two constraint oids a hash collision, but of course that isn't sufficient, since we memcmp the whole key as well.

Last night I started looking at ri_constraint_cache, which is maybe a little more interesting due to the syscache invalidation code. A parallel test could cause an invalidation between lines of the without_overlaps test. Getting the wrong riinfo could make us treat a RESTRICT constraint as NO ACTION. But I don't see any way for that to happen yet.

I have too much confidence in the Postgres codebase to really expect to find bugs in any of these places. And yet I don't see how 1772d554b0 could make a RESTRICT test fail, since all its changes are wrapped in `if (is_no_action)`---except if the RESTRICT constraint is somehow executing the NO ACTION query by mistake.

Anyway I'll keep at it!

Yours,

--
Paul              ~{:-)
p...@illuminatedcomputing.com



Reply via email to