RE: Conflict detection for update_deleted in logical replication

Zhijie Hou (Fujitsu) Wed, 21 May 2025 20:00:45 -0700

On Fri, May 16, 2025 at 7:31 PM Amit Kapila wrote:
> 
> A few more comments:
> =================
> 3.
> maybe_advance_nonremovable_xid(RetainConflictInfoData *data,
>    bool status_received)
> {
> /* Exit early if retaining conflict information is not required */
> if (!MySubscription->retainconflictinfo)
> return;
> 
> /*
> * It is sufficient to manage non-removable transaction ID for a
> * subscription by the main apply worker to detect update_deleted conflict
> * even for table sync or parallel apply workers.
> */
> if (!am_leader_apply_worker())
> return;
> 
> /* Exit early if we have already stopped retaining */
> if (MyLogicalRepWorker->stop_conflict_info_retention)
> return;
> ...
> 
> get_candidate_xid()
> {
> ...
> if (!TimestampDifferenceExceeds(data->candidate_xid_time, now,
> data->xid_advance_interval))
> return;
> 
> Would it be better to encapsulate all these preliminary checks that
> decide whether we can move to computing oldest_nonremovable_xid in a
> separate function? The check in get_candidate_xid would require some
> additional conditions because it is not required in every phase.
> Additionally, we can move the core phase processing logic to compute
> in a separate function. We can try this to see if the code looks
> better with such a refactoring.


I moved the switch case into a separate function process_rci_phase_transition()
and call it in each phase handling function (get_candidate_xid etc).

I also added a new function can_advance_nonremovable_xid to maintain the
preliminary checks. But after re-thinking, the timer check cannot be moved into
this function because it's needed even if being called from the new function
process_rci_phase_transition()->get_candidate_xid() to ensure we do not get
next xid too frequently.

> 
> 4.
> + /*
> + * Check if all remote concurrent transactions that were active at the
> + * first status request have now completed. If completed, proceed to the
> + * next phase; otherwise, continue checking the publisher status until
> + * these transactions finish.
> + */
> + if (FullTransactionIdPrecedesOrEquals(data->last_phase_at,
> +   remote_full_xid))
> + data->phase = RCI_WAIT_FOR_LOCAL_FLUSH;
> 
> I think there is a possibility of optimization here for cases where
> there are no new transactions on the publisher side across the next
> cycle of advancement of oldest_nonremovable_xid. We can simply set
> candidate_xid as oldest_nonremovable_xid instead of going into
> RCI_WAIT_FOR_LOCAL_FLUSH phase. If you want to keep the code simple
> for the first version, then at least note that down in comments, but
> OTOH, if it is simple to achieve, then let's do it now.

I think to implement this optimization, it's needed to compare both
remote_nextxid and remote_lsn across consecutive cycles. Although
remote_nextxid might remain unchanged between cycles, old transactions might
have been committed in between two cycles, not affecting nextxid. Therefore,
maintaining two fields last_remote_nextxid and last_remote_lsn within the
structure for comparison is required. Additionally, this optimization implies
skipping the clock skew check in last phase, unless we move the check to a
earlier place. Given that the cost of continuing in RCI_WAIT_FOR_LOCAL_FLUSH
when there's no publisher activity is minimal, I personally prefer keeping the
code simple in this version.

Best Regards,
Hou zj

RE: Conflict detection for update_deleted in logical replication

Reply via email to