On Wed, Mar 26, 2025 at 4:17 PM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > > Here's a rebased version of the patch series. >
Thanks for the patches. While testing the GUC "max_conflict_retention_duration", I noticed a behavior that seems to bypass its intended purpose. On Pub, if a txn is stuck in the COMMIT phase for a long time, the apply_worker on the sub keeps looping in wait_for_publisher_status() until that Pub's concurrent txn completes its commit. Due to this, the apply worker can't advance its oldest_nonremovable_xid and keeps waiting for the Pub's txn to finish. In such a case, even if the wait time exceeds the configured max_conflict_retention_duration, conflict retention doesn't stop for the apply_worker. The conflict info retention is stoppend only once the Pub's txn is committed and the apply_worker moves to wait_for_local_flush(). Doesn't this defeat the purpose of max_conflict_retention_duration? The apply worker has exceeded the max wait time but still retains the conflict info. I think we should consider applying the same max time limit check inside wait_for_publisher_status() as well. -- Thanks, Nisha