Hi, On 2020-02-28 22:10:52 -0800, Andres Freund wrote: > So, um. What happens is that doDeletion() does a catalog scan, which > sets a snapshot. The results of that catalog scan are then used to > perform modifications. But at that point there's no guarantee that we > still hold *any* snapshot, as e.g. invalidations can trigger the catalog > snapshot being released. > > I don't see how that's safe. Without ->xmin preventing that, > intermediate row versions that we did look up could just get vacuumed > away, and replaced with a different row. That does seem like a serious > issue? > > I think there's likely a lot of places that can hit this? What makes it > safe for InvalidateCatalogSnapshot()->SnapshotResetXmin() to release > ->xmin when there previously has been *any* catalog access? Because in > contrast to normal table modifications, there's currently nothing at all > forcing us to hold a snapshot between catalog lookups an their > modifications? > > Am I missing something? Or is this a fairly significant hole in our > arrangements?
I still think that's true. In a first iteration I hacked around the problem by explicitly registering a catalog snapshot in RemoveTempRelations(). That *sometimes* allows to get through the regression tests without the assertions triggering. But I don't think that's good enough (even if we fixed the other potential crashes similarly). The only reason that avoids the asserts is because in nearly all other cases there's also a user snapshot that's pushed. But that pushed snapshot can have an xmin that's newer than the catalog snapshot, which means we're still in danger of tids from catalog scans being outdated. My preliminary conclusion is that it's simply not safe to do SnapshotResetXmin() from within InvalidateCatalogSnapshot(), PopActiveSnapshot(), UnregisterSnapshotFromOwner() etc. Instead we need to defer the SnapshotResetXmin() call until at least CommitTransactionCommand()? Outside of that there ought (with exception of multi-transaction commands, but they have to be careful anyway) to be no "in progress" sequences of related catalog lookups/modifications. Alternatively we could ensure that all catalog lookup/mod sequences ensure that the first catalog snapshot is registered. But that seems like a gargantuan task? > The easiest way to fix this would probably be to have inval.c call a > version of InvalidateCatalogSnapshot() that leaves the oldest catalog > snapshot around, but sets up things so that GetCatalogSnapshot() will > return a freshly taken snapshot? ISTM that pretty much every > InvalidateCatalogSnapshot() call within a transaction needs that behaviour? A related question is in which cases is it actually safe to use a snapshot that's not registered, nor pushed as the active snapshot. snapmgr.c just provides: * Note that the return value may point at static storage that will be modified * by future calls and by CommandCounterIncrement(). Callers should call * RegisterSnapshot or PushActiveSnapshot on the returned snap if it is to be * used very long. but doesn't clarify what 'very long' means. As far as I can tell, there's very little that actually safe. It's probably ok to do a single visiblity test, but anything that e.g. has a chance of accepting invalidations is entirely unsafe? Greetings, Andres Freund