On Friday, July 25, 2025 2:33 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Wed, Jul 23, 2025 at 12:53 PM Zhijie Hou (Fujitsu) > <houzj.f...@fujitsu.com> wrote: > > > > Thanks for pushing. I have rebased the remaining patches. > > > > + * This function performs a full table scan instead of using indexes > + because > + * index scans could miss deleted tuples if an index has been > + re-indexed or > + * re-created during change applications. > > IIUC, once the tuple is not found during update, the patch does an additional > scan with SnapshotAny to find the DEAD tuple, so that it can report > update_deleted conflict, if one is found. The reason in the comments to do > sequential scan in such cases sound reasonable but I was thinking if we can > do index scans if the pg_conflict_* slot's xmin is ahead of the RI (or any > usable > index that can be used during scan) index_tuple's xmin? Note, we use a similar > check with the indcheckxmin parameter in pg_index though the purpose of > that is different. If this can happen then still in most cases the index scan > will > happen.
Right, I think it makes sense to do with the index scan when the index's xmin is less than the conflict detection xmin, as that can ensure that all the tuples deleted before the index creation or re-indexing are irrelevant for conflict detection. I have implemented in the V53 patch set and improved the test to verify both index and seq scan for dead tuples. The V53-0001 also includes Shveta's comments in [1]. Apart from above issue, I'd like to clarify why we scan all matching dead tuples in the relation to find the most recently deleted one in the patch, and I will share an example for the same. The main reason is that only the latest deletion information is relevant for resolving conflicts. If the first tuple retrieved is antiquated while a newer deleted tuple exists, users may incorrectly resolve the remote change by applying a last-update-win strategy. Here is an example: 1. In a BI-cluster setup, if both nodes initially contain empty data: Node A: tbl (empty) Node B: tbl (empty) 2. Then if user do the following operations on Node A and wait for them to be replicated to Node B: INSERT (pk, 1) DELETE (pk, 1) @9:00 INSERT (pk, 1) The data on both nodes looks like: Node A: tbl (pk, 1) - live tuple (pk, 1) - dead tuple - @9:00 Node B: tbl (pk, 1) - live tuple (pk, 1) - dead tuple - @9:00 3. If a user do DELETE (pk) on Node B @9:02, and do UDPATE (pk, 1)->(pk, 2) on Node A @9:01. When applying the UPDATE on Node B, it cannot find the target tuple, so will search the dead tuples, but there are two dead tuples: Node B: tbl (pk, 1) - live tuple (pk, 1) - dead tuple - @9:00 (pk, 1) - dead tuple - @9:02 If we only fetch the first tuple in the scan, it could be either a) the tuple deleted @9:00 which is older than the remote UPDATE, or b) the tuple deleted @9:02, which is newer than the remote UPDATE is @9:01. User may choose to apply the UPDATE for case a) which can cause data inconsistency between nodes (using last-update-win strategy). Ideally, we should give the resolve the new dead tuple @9:02, so the resolver can choose to ignore the remote UDPATE, keeping the data consistent. [1] https://www.postgresql.org/message-id/CAJpy0uDiyjDzLU-%3DNGO7PnXB4OLy4%2BRxJiAySdw%3Da%2BYO62JO2g%40mail.gmail.com Best Regards, Hou zj
v53-0003-Re-create-the-replication-slot-if-the-conflict-r.patch
Description: v53-0003-Re-create-the-replication-slot-if-the-conflict-r.patch
v53-0001-Support-the-conflict-detection-for-update_delete.patch
Description: v53-0001-Support-the-conflict-detection-for-update_delete.patch
v53-0002-Introduce-a-new-GUC-max_conflict_retention_durat.patch
Description: v53-0002-Introduce-a-new-GUC-max_conflict_retention_durat.patch