Em qua., 29 de set. de 2021 às 06:55, Drouvot, Bertrand <bdrou...@amazon.com> escreveu:
> Hi, > > On 9/28/21 6:50 PM, Tom Lane wrote: > > "Drouvot, Bertrand" <bdrou...@amazon.com> writes: > >> Does it make sense (as it is currently) to set the ActiveSnapshot to > >> NULL and not ensuring the same is done for ActivePortal->portalSnapshot? > > I think that patch is just a kluge :-( > right, sounded too simple to be "true". > > > > After tracing through this I've figured out what is happening, and > > why you need to put the exception block inside a loop to make it > > happen. The first iteration goes fine, and we end up with an empty > > ActiveSnapshot stack (and no portalSnapshots) because that's how > > the COMMIT leaves it. In the second iteration, we try to > > re-establish the portal snapshot, but at the point where > > EnsurePortalSnapshotExists is called (from the first INSERT > > command) we are already inside a subtransaction thanks to the > > plpgsql exception block. This means that the stacked ActiveSnapshot > > has as_level = 2, although the Portal owning it belongs to the > > outer transaction. So at the next exception, AtSubAbort_Snapshot > > zaps the stacked ActiveSnapshot, but the Portal stays alive and > > now it has a dangling portalSnapshot pointer. > > Thanks for the explanation! > > > Basically there seem to be two ways to fix this, both requiring > > API additions to snapmgr.c (hence, cc'ing Alvaro for opinion): > > > > 1. Provide a variant of PushActiveSnapshot that allows the caller > > to specify the as_level to use, and then have > > EnsurePortalSnapshotExists, as well as other places that create > > Portal-associated snapshots, use this with as_level equal to the > > Portal's createSubid. The main hazard I see here is that this could > > theoretically allow the ActiveSnapshot stack to end up with > > out-of-order as_level values, causing issues. For the moment we > > could probably just add assertions to check that the passed as_level > > is >= next level, or maybe even that this variant is only called with > > empty ActiveSnapshot stack. > I think we may get a non empty ActiveSnapshot stack with prepared > statements, so tempted to do the assertion on the levels. > > > > 2. Provide a way for AtSubAbort_Portals to detect whether a > > portalSnapshot pointer points to a snap that's going to be > > deleted by AtSubAbort_Snapshot, and then just have it clear any > > portalSnapshots that are about to become dangling. (This'd amount > > to accepting the possibility that portalSnapshot is of a different > > subxact level from the portal, and dealing with the situation.) > > > > I initially thought #2 would be the way to go, but it turns out > > to be a bit messy since what we have is a Snapshot pointer not an > > ActiveSnapshotElt pointer. We'd have to do something like search the > > ActiveSnapshot stack looking for pointer equality to the caller's > > Snapshot pointer, which seems fragile --- do we assume as_snap is > > unique for any other purpose? > > > > That being the case, I'm now leaning to #1. Thoughts? > > I'm also inclined to #1. > I have a stupid question, why duplicate PushActiveSnapshot? Wouldn't one function be better? PushActiveSnapshot(Snapshot snap, int as_level); Sample calls: PushActiveSnapshot(GetTransactionSnapshot(), GetCurrentTransactionNestLevel()); PushActiveSnapshot(queryDesc->snapshot, GetCurrentTransactionNestLevel()); PushActiveSnapshot(GetTransactionSnapshot(), portal->createSubid); regards, Ranier Vilela