RE: Conflict detection for update_deleted in logical replication

Zhijie Hou (Fujitsu) Fri, 25 Jul 2025 04:09:23 -0700

On Friday, July 25, 2025 2:33 PM Amit Kapila <[email protected]> wrote:
> 
> On Wed, Jul 23, 2025 at 12:53 PM Zhijie Hou (Fujitsu)
> <[email protected]> wrote:
> >
> > Thanks for pushing. I have rebased the remaining patches.
> >
> 
> + * This function performs a full table scan instead of using indexes
> + because
> + * index scans could miss deleted tuples if an index has been
> + re-indexed or
> + * re-created during change applications.
> 
> IIUC, once the tuple is not found during update, the patch does an additional
> scan with SnapshotAny to find the DEAD tuple, so that it can report
> update_deleted conflict, if one is found. The reason in the comments to do
> sequential scan in such cases sound reasonable but I was thinking if we can
> do index scans if the pg_conflict_* slot's xmin is ahead of the RI (or any 
> usable
> index that can be used during scan) index_tuple's xmin? Note, we use a similar
> check with the indcheckxmin parameter in pg_index though the purpose of
> that is different. If this can happen then still in most cases the index scan 
> will
> happen.


Right, I think it makes sense to do with the index scan when the index's xmin is
less than the conflict detection xmin, as that can ensure that all the tuples
deleted before the index creation or re-indexing are irrelevant for conflict
detection.

I have implemented in the V53 patch set and improved the test to verify both
index and seq scan for dead tuples.

The V53-0001 also includes Shveta's comments in [1].

Apart from above issue,
I'd like to clarify why we scan all matching dead tuples in the relation to
find the most recently deleted one in the patch, and I will share an example for
the same.

The main reason is that only the latest deletion information is relevant for
resolving conflicts. If the first tuple retrieved is antiquated while a newer
deleted tuple exists, users may incorrectly resolve the remote change by
applying a last-update-win strategy. Here is an example:

1. In a BI-cluster setup, if both nodes initially contain empty data:
 
Node A: tbl (empty)
Node B: tbl (empty)
 
2. Then if user do the following operations on Node A and wait for them to be
replicated to Node B:
 
INSERT (pk, 1)
DELETE (pk, 1) @9:00
INSERT (pk, 1)
 
The data on both nodes looks like:
 
Node A: tbl (pk, 1) - live tuple
           (pk, 1) - dead tuple - @9:00
Node B: tbl (pk, 1) - live tuple
           (pk, 1) - dead tuple - @9:00
 
3. If a user do DELETE (pk) on Node B @9:02, and do UDPATE (pk, 1)->(pk, 2) on 
Node A
   @9:01.
 
When applying the UPDATE on Node B, it cannot find the target tuple, so will
search the dead tuples, but there are two dead tuples:
 
Node B: tbl (pk, 1) - live tuple
           (pk, 1) - dead tuple - @9:00
           (pk, 1) - dead tuple - @9:02
 
If we only fetch the first tuple in the scan, it could be either a) the tuple
deleted @9:00 which is older than the remote UPDATE, or b) the tuple deleted
@9:02, which is newer than the remote UPDATE is @9:01. User may choose to apply
the UPDATE for case a) which can cause data inconsistency between nodes
(using last-update-win strategy).

Ideally, we should give the resolve the new dead tuple @9:02, so the resolver
can choose to ignore the remote UDPATE, keeping the data consistent.

[1] 
https://www.postgresql.org/message-id/CAJpy0uDiyjDzLU-%3DNGO7PnXB4OLy4%2BRxJiAySdw%3Da%2BYO62JO2g%40mail.gmail.com

Best Regards,
Hou zj

v53-0003-Re-create-the-replication-slot-if-the-conflict-r.patch
Description: v53-0003-Re-create-the-replication-slot-if-the-conflict-r.patch

v53-0001-Support-the-conflict-detection-for-update_delete.patch
Description: v53-0001-Support-the-conflict-detection-for-update_delete.patch

v53-0002-Introduce-a-new-GUC-max_conflict_retention_durat.patch
Description: v53-0002-Introduce-a-new-GUC-max_conflict_retention_durat.patch

RE: Conflict detection for update_deleted in logical replication

Reply via email to