Em seg., 10 de mar. de 2025 às 12:22, Heikki Linnakangas <hlinn...@iki.fi> escreveu:
> On 21/01/2025 12:05, Alexander Korotkov wrote: > > On Tue, Jan 21, 2025 at 12:03 PM Alexander Korotkov > > <aekorot...@gmail.com> wrote: > >> On Sun, Dec 29, 2024 at 3:59 PM Heikki Linnakangas <hlinn...@iki.fi> > wrote: > >>> However, I think GetLatestSnapshot() is wrong here anyway. None of this > >>> matters for heapam which just ignores the 'snapshot' argument, but > let's > >>> assume a different AM that would use the snapshot to identify the tuple > >>> version. The TID was fetched from an index using an index scan with > >>> SnapshotDirty. There's no guarantee that the LatestSnapshot would match > >>> the same tuple version that the index scan found. If an MVCC snapshot > is > >>> needed, surely it should be acquired before the index scan, and used > for > >>> the index scan as well. > >>> > >>> I see three options: > >>> > >>> 1. Remove the 'snapshot' argument, since it's not used by heapam. If we > >>> get a table AM where a single TID represents multiple row versions, > this > >>> will need to be revisited. > >>> > >>> 2. Rewrite the recheck code in execReplication.c so that it uses the > >>> snapshot in a more consistent fashion. Call GetLatestSnapshot() first, > >>> and use the same snapshot in the index scan and table_tuple_lock(). > >>> Acquiring a snapshot isn't free though, so it would be nice to avoid > >>> doing that when the heapam is just going to ignore it anyway. If we go > >>> with this option, I think we could reuse the snapshot that is already > >>> active in most cases, and only take a new snapshot if the tuple was > >>> concurrently updated. > >>> > >>> 3. Modify the tableam interface so that the index scan can return a > more > >>> unique identifier of the tuple version. In heapam, it could be the TID > >>> like today, but a different AM could return some other token. Continue > >>> to use SnapshotDirty in the index scan, but in the call to > >>> table_tuple_lock(), instead of passing GetLatestSnapshot() and TID, > pass > >>> the token you got index_getnext_slot(). > >> > >> Thank you for your finding and analysis. I would vote for #3. > >> Currently snapshot argument is not properly used, and the heapam code > >> doesn't check it. I think allowing flexible tuple identifier is > >> essential for future of table AM API. Therefore, any > >> snapshot/visibility information could be put inside that tuple > >> identifier if needed. > > > > To be more specific, I vote for #1 + #3. Remove the snapshot argument > > for now, then work on flexible tuple identifier. > > I committed and backpatched the trivial fix for not, just replacing > GetLatestSnapshot() with GetActiveSnapshot(). I got cold feet on just > removing the argument without providing a replacement, because it's not > 100% clear to me if the current situation is broken or not. > > To summarize the issue, RelationFindReplTupleByIndex() performs an index > scan with DirtySnapshot, and then calls table_tuple_lock() with a fresh > MVCC snapshot to lock the row. There's no guarantee that the index scan > found the same row version that table_tuple_lock() will lock, if the TID > alone doesn't uniquely identify it. But that's still OK as long as the > key column hasn't changed, like with heapam's HOT updates. I couldn't > convince myself that it's broken nor that it's guaranteed to be correct > with other AMs. > Just for curiosity. *RelationFindReplTupleSeq* on the same source, does not suffer from the same issue? PushActiveSnapshot(GetLatestSnapshot()); res = table_tuple_lock(rel, &(outslot->tts_tid), GetLatestSnapshot(), best regards, Ranier Vilela