Re: Conflict detection for update_deleted in logical replication

Masahiko Sawada Fri, 17 Jan 2025 00:07:51 -0800

On Thu, Jan 16, 2025 at 2:02 AM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Wed, Jan 15, 2025 at 2:20 PM Zhijie Hou (Fujitsu)
> <houzj.f...@fujitsu.com> wrote:
> >
> > In the latest version, we implemented a simpler approach that allows the 
> > apply
> > worker to directly advance the oldest_nonremovable_xid if the waiting time
> > exceeds the newly introduced option's limit. I've named this option
> > "max_conflict_retention_duration," as it aligns better with the conflict
> > detection concept and the "retain_conflict_info" option.
> >
> > During the last phase (RCI_WAIT_FOR_LOCAL_FLUSH), the apply worker evaluates
> > how much time it has spent waiting. If this duration exceeds the
> > max_conflict_retention_duration, the worker directly advances the
> > oldest_nonremovable_xid and logs a message indicating the forced 
> > advancement of
> > the non-removable transaction ID.
> >
> > This approach is a bit like a time-based option that discussed before.
> > Compared to the slot invalidation approach, this approach is simpler 
> > because we
> > can avoid adding 1) new slot invalidation type due to apply lag, 2) new 
> > field
> > lag_behind in shared memory (MyLogicalRepWorker) to indicate when the lag
> > exceeds the limit, and 3) additional logic in the launcher to handle each
> > worker's lag status.
> >
> > In the slot invalidation, user would be able to confirm if the current by
> > checking if the slot in pg_replication_slot in invalidated or not, while in 
> > the
> > simpler approach mentioned, user could only confirm that by checking the 
> > LOGs.
> >
>
> The user needs to check the LOGs corresponding to all subscriptions on
> the node. I see the simplicity of the approach you used but still the
> slot_invalidation idea sounds better to me on the grounds that it will
> be convenient for users/DBA to know when to rely on the update_missing
> type conflict if there is a valid and active slot with the name
> 'pg_conflict_detection' (or whatever name we decide to give) then
> users can rely on the detected conflict. Sawada-San, and others, do
> you have any preference on this matter?


I also think that it would be convenient for users if they could check
if there was a valid and active pg_conflict_detection slot to know
when to rely on detected conflicts. On the other hand, I think it
would not be convenient for users if we always required user
intervention to re-create the slot. Once the slot is invalidated or
dropped, we can no longer guarantee that update_deleted conflicts are
detected reliably, but the logical replication would still be running.
That means we might have already been missing update_deleted
conflicts. From the user perspective, it would be cumbersome to
disable/enable retain_conflict_info (and check if the slot was
re-created) just to make retain_conflict_info work again.

> Do we want to prohibit the combination copy_data as true and
> retain_conflict_info=true?  I understand that with the new parameter
> 'max_conflict_retention_duration', for large copies slot would anyway
> be invalidated but I don't want to give users more ways to see this
> slot invalidated in the beginning itself. Similarly during ALTER
> SUBSCRIPTION, if the initial synch is in progress, we can disallow
> enabling retain_conflict_info. Later, if there is a real demand for
> such a combination, we can always enable it.

Does it mean that whenever users want to start the initial sync they
need to disable reatin_conflict_info on all subscriptions? Which
doesn't seem very convenient.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Conflict detection for update_deleted in logical replication

Reply via email to