On Mon, Jan 6, 2025 at 3:22 AM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > > On Friday, January 3, 2025 2:36 PM Masahiko Sawada <sawada.m...@gmail.com> > wrote: > > Hi, > > > > > I have one comment on the 0001 patch: > > Thanks for the comments! > > > > > + /* > > + * The changes made by this and later transactions are still > > non-removable > > + * to allow for the detection of update_deleted conflicts when > > applying > > + * changes in this logical replication worker. > > + * > > + * Note that this info cannot directly protect dead tuples from > > being > > + * prematurely frozen or removed. The logical replication launcher > > + * asynchronously collects this info to determine whether to advance > > the > > + * xmin value of the replication slot. > > + * > > + * Therefore, FullTransactionId that includes both the > > transaction ID and > > + * its epoch is used here instead of a single Transaction ID. This > > is > > + * critical because without considering the epoch, the transaction > > ID > > + * alone may appear as if it is in the future due to transaction ID > > + * wraparound. > > + */ > > + FullTransactionId oldest_nonremovable_xid; > > > > The last paragraph of the comment mentions that we need to use > > FullTransactionId to properly compare XIDs even after the XID wraparound > > happens. But once we set the oldest-nonremovable-xid it prevents XIDs from > > being wraparound, no? I mean that workers' > > oldest-nonremovable-xid values and slot's non-removal-xid (i.e., its > > xmin) are never away from more than 2^31 XIDs. > > I think the issue is that the launcher may create the replication slot after > the apply worker has already set the 'oldest_nonremovable_xid' because the > launcher are doing that asynchronously. So, Before the slot is created, > there's > a window where transaction IDs might wrap around. If initially the apply > worker > has computed a candidate_xid (755) and the xid wraparound before the launcher > creates the slot, causing the new current xid to be (740), then the old > candidate_xid(755) looks like a xid in the future, and the launcher could > advance the xmin to 755 which cause the dead tuples to be removed prematurely. > (We are trying to reproduce this to ensure that it's a real issue and will > share after finishing)
The slot's first xmin is calculated by GetOldestSafeDecodingTransactionId(false). The initial computed cancidate_xid could be newer than this xid? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com