On Fri, May 16, 2025 at 7:31 PM Amit Kapila wrote: > > A few more comments: > ================= > 3. > maybe_advance_nonremovable_xid(RetainConflictInfoData *data, > bool status_received) > { > /* Exit early if retaining conflict information is not required */ > if (!MySubscription->retainconflictinfo) > return; > > /* > * It is sufficient to manage non-removable transaction ID for a > * subscription by the main apply worker to detect update_deleted conflict > * even for table sync or parallel apply workers. > */ > if (!am_leader_apply_worker()) > return; > > /* Exit early if we have already stopped retaining */ > if (MyLogicalRepWorker->stop_conflict_info_retention) > return; > ... > > get_candidate_xid() > { > ... > if (!TimestampDifferenceExceeds(data->candidate_xid_time, now, > data->xid_advance_interval)) > return; > > Would it be better to encapsulate all these preliminary checks that > decide whether we can move to computing oldest_nonremovable_xid in a > separate function? The check in get_candidate_xid would require some > additional conditions because it is not required in every phase. > Additionally, we can move the core phase processing logic to compute > in a separate function. We can try this to see if the code looks > better with such a refactoring.
I moved the switch case into a separate function process_rci_phase_transition() and call it in each phase handling function (get_candidate_xid etc). I also added a new function can_advance_nonremovable_xid to maintain the preliminary checks. But after re-thinking, the timer check cannot be moved into this function because it's needed even if being called from the new function process_rci_phase_transition()->get_candidate_xid() to ensure we do not get next xid too frequently. > > 4. > + /* > + * Check if all remote concurrent transactions that were active at the > + * first status request have now completed. If completed, proceed to the > + * next phase; otherwise, continue checking the publisher status until > + * these transactions finish. > + */ > + if (FullTransactionIdPrecedesOrEquals(data->last_phase_at, > + remote_full_xid)) > + data->phase = RCI_WAIT_FOR_LOCAL_FLUSH; > > I think there is a possibility of optimization here for cases where > there are no new transactions on the publisher side across the next > cycle of advancement of oldest_nonremovable_xid. We can simply set > candidate_xid as oldest_nonremovable_xid instead of going into > RCI_WAIT_FOR_LOCAL_FLUSH phase. If you want to keep the code simple > for the first version, then at least note that down in comments, but > OTOH, if it is simple to achieve, then let's do it now. I think to implement this optimization, it's needed to compare both remote_nextxid and remote_lsn across consecutive cycles. Although remote_nextxid might remain unchanged between cycles, old transactions might have been committed in between two cycles, not affecting nextxid. Therefore, maintaining two fields last_remote_nextxid and last_remote_lsn within the structure for comparison is required. Additionally, this optimization implies skipping the clock skew check in last phase, unless we move the check to a earlier place. Given that the cost of continuing in RCI_WAIT_FOR_LOCAL_FLUSH when there's no publisher activity is minimal, I personally prefer keeping the code simple in this version. Best Regards, Hou zj