On Thu, Aug 8, 2024 at 12:03 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Thu, Aug 8, 2024 at 10:37 AM Hayato Kuroda (Fujitsu) > <kuroda.hay...@fujitsu.com> wrote: > > > ... > > > > An easiest fix is to reset session replication origin before calling the > > RecordTransactionAbort(). I think this can happen when 1) > > LogicalRepApplyLoop() > > raises an ERROR or 2) apply worker exits. Attached patch can fix the issue > > on HEAD. > > > > Few comments: > ============= > * > @@ -4409,6 +4409,14 @@ start_apply(XLogRecPtr origin_startpos) > } > PG_CATCH(); > { > + /* > + * Reset the origin data to prevent the advancement of origin progress > + * if the transaction failed to apply. > + */ > + replorigin_session_origin = InvalidRepOriginId; > + replorigin_session_origin_lsn = InvalidXLogRecPtr; > + replorigin_session_origin_timestamp = 0; > > Can't we call replorigin_reset() instead here? > > * > + /* > + * Register a callback to reset the origin state before aborting the > + * transaction in ShutdownPostgres(). This is to prevent the advancement > + * of origin progress if the transaction failed to apply. > + */ > + before_shmem_exit(replorigin_reset, (Datum) 0); > > I think we need this despite resetting the origin-related variables in > PG_CATCH block to handle FATAL error cases, right? If so, can we use > PG_ENSURE_ERROR_CLEANUP() instead of PG_CATCH()?
+1 Basic tests work fine on this patch. Just thinking out loud, SetupApplyOrSyncWorker() is called for table-sync worker as well and IIUC tablesync worker does not deal with 2PC txns. So do we even need to register replorigin_reset() for tablesync worker as well? If we may hit such an issue in general, then perhaps we need it in table-sync worker otherwise not. It needs some investigation. Thoughts? thanks Shveta