Hi, On 2020-02-28 22:10:52 -0800, Andres Freund wrote: > On 2020-02-28 21:24:59 -0800, Andres Freund wrote: > > Turns out that I am to blame for that. All the way back in 9.4. For > > logical decoding I needed to make ScanPgRelation() use a specific type > > of snapshot during one corner case of logical decoding. For reasons lost > > to time, I didn't continue to pass NULL to systable_beginscan() in the > > plain case, but did an explicit GetCatalogSnapshot(RelationRelationId). > > Note the missing RegisterSnapshot()... > > Oh, I pierced through the veil: It's likely because the removal of > SnapshotNow happened concurrently to the development of logical > decoding. Before using proper snapshot for catalog scans passing in > SnapshotNow was precisely the right thing to do... > > I think that's somewhat unlikely to actively cause problems in practice, > as ScanPgRelation() requires that we already have a lock on the > relation, we only look for a single row, and I don't think we rely on > the result's tid to be correct. I don't immediately see a case where > this would trigger in a problematic way.
Pushed a fix for this. > So, um. What happens is that doDeletion() does a catalog scan, which > sets a snapshot. The results of that catalog scan are then used to > perform modifications. But at that point there's no guarantee that we > still hold *any* snapshot, as e.g. invalidations can trigger the catalog > snapshot being released. > > I don't see how that's safe. Without ->xmin preventing that, > intermediate row versions that we did look up could just get vacuumed > away, and replaced with a different row. That does seem like a serious > issue? > > I think there's likely a lot of places that can hit this? What makes it > safe for InvalidateCatalogSnapshot()->SnapshotResetXmin() to release > ->xmin when there previously has been *any* catalog access? Because in > contrast to normal table modifications, there's currently nothing at all > forcing us to hold a snapshot between catalog lookups an their > modifications? > > Am I missing something? Or is this a fairly significant hole in our > arrangements? > > The easiest way to fix this would probably be to have inval.c call a > version of InvalidateCatalogSnapshot() that leaves the oldest catalog > snapshot around, but sets up things so that GetCatalogSnapshot() will > return a freshly taken snapshot? ISTM that pretty much every > InvalidateCatalogSnapshot() call within a transaction needs that behaviour? I'd still like to get some input here. Greetings, Andres Freund