On Thu, Apr 24, 2025 at 6:11 PM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote:
> > Few comments for patch004: > > Config.sgml: > > 1) > > + <para> > > + Maximum duration (in milliseconds) for which conflict > > + information can be retained for conflict detection by the apply > > worker. > > + The default value is <literal>0</literal>, indicating that conflict > > + information is retained until it is no longer needed for detection > > + purposes. > > + </para> > > > > IIUC, the above is not entirely accurate. Suppose the subscriber manages to > > catch up and sets oldest_nonremovable_xid to 100, which is then updated in > > slot. After this, the apply worker takes a nap and begins a new xid update > > cycle. > > Now, let’s say the next candidate_xid is 200, but this time the subscriber > > fails > > to keep up and exceeds max_conflict_retention_duration. As a result, it sets > > oldest_nonremovable_xid to InvalidTransactionId, and the launcher skips > > updating the slot’s xmin. > > If the time exceeds the max_conflict_retention_duration, the launcher would > Invalidate the slot, instead of skipping updating it. So the conflict > info(e.g., > dead tuples) would not be retained anymore. > launcher will not invalidate the slot until all subscriptions have stopped conflict_info retention. So info of dead tuples for a particular oldest_xmin of a particular apply worker could be retained for much longer than this configured duration. If other apply workers are actively working (catching up with primary), then they should keep on advancing xmin of shared slot but if xmin of shared slot remains same for say 15min+15min+15min for 3 apply-workers (assuming they are marking themselves with stop_conflict_retention one after other and xmin of slot has not been advanced), then the first apply worker having marked itself with stop_conflict_retention still has access to the oldest_xmin's data for 45 mins instead of 15 mins. (where max_conflict_retention_duration=15 mins). Please let me know if my understanding is wrong. > > However, the previous xmin value (100) is still there > > in the slot, causing its data to be retained beyond the > > max_conflict_retention_duration. The xid 200 which actually honors > > max_conflict_retention_duration was never marked for retention. If my > > understanding is correct, then the documentation doesn’t fully capture this > > scenario. > > As mentioned above, the strategy here is to invalidate the slot. Please consider the case with multiple subscribers. Sorry if I missed to mention in my previous email that it was a multi-sub case. thanks Shveta