On Thu, Jan 16, 2025 at 2:02 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Wed, Jan 15, 2025 at 2:20 PM Zhijie Hou (Fujitsu) > <houzj.f...@fujitsu.com> wrote: > > > > In the latest version, we implemented a simpler approach that allows the > > apply > > worker to directly advance the oldest_nonremovable_xid if the waiting time > > exceeds the newly introduced option's limit. I've named this option > > "max_conflict_retention_duration," as it aligns better with the conflict > > detection concept and the "retain_conflict_info" option. > > > > During the last phase (RCI_WAIT_FOR_LOCAL_FLUSH), the apply worker evaluates > > how much time it has spent waiting. If this duration exceeds the > > max_conflict_retention_duration, the worker directly advances the > > oldest_nonremovable_xid and logs a message indicating the forced > > advancement of > > the non-removable transaction ID. > > > > This approach is a bit like a time-based option that discussed before. > > Compared to the slot invalidation approach, this approach is simpler > > because we > > can avoid adding 1) new slot invalidation type due to apply lag, 2) new > > field > > lag_behind in shared memory (MyLogicalRepWorker) to indicate when the lag > > exceeds the limit, and 3) additional logic in the launcher to handle each > > worker's lag status. > > > > In the slot invalidation, user would be able to confirm if the current by > > checking if the slot in pg_replication_slot in invalidated or not, while in > > the > > simpler approach mentioned, user could only confirm that by checking the > > LOGs. > > > > The user needs to check the LOGs corresponding to all subscriptions on > the node. I see the simplicity of the approach you used but still the > slot_invalidation idea sounds better to me on the grounds that it will > be convenient for users/DBA to know when to rely on the update_missing > type conflict if there is a valid and active slot with the name > 'pg_conflict_detection' (or whatever name we decide to give) then > users can rely on the detected conflict. Sawada-San, and others, do > you have any preference on this matter?
I also think that it would be convenient for users if they could check if there was a valid and active pg_conflict_detection slot to know when to rely on detected conflicts. On the other hand, I think it would not be convenient for users if we always required user intervention to re-create the slot. Once the slot is invalidated or dropped, we can no longer guarantee that update_deleted conflicts are detected reliably, but the logical replication would still be running. That means we might have already been missing update_deleted conflicts. From the user perspective, it would be cumbersome to disable/enable retain_conflict_info (and check if the slot was re-created) just to make retain_conflict_info work again. > Do we want to prohibit the combination copy_data as true and > retain_conflict_info=true? I understand that with the new parameter > 'max_conflict_retention_duration', for large copies slot would anyway > be invalidated but I don't want to give users more ways to see this > slot invalidated in the beginning itself. Similarly during ALTER > SUBSCRIPTION, if the initial synch is in progress, we can disallow > enabling retain_conflict_info. Later, if there is a real demand for > such a combination, we can always enable it. Does it mean that whenever users want to start the initial sync they need to disable reatin_conflict_info on all subscriptions? Which doesn't seem very convenient. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com