On Tue, Dec 7, 2021 at 6:07 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > Hi all, > > While updating the patch I recently posted[1] to make pg_waldump > report replication origin ID, LSN, and timestamp, I found a bug that > replication origin timestamp is not set in ROLLBACK PREPARED case. > Commit 8bdb1332eb5 (CC'ed Amit) added an argument to > ReorderBufferFinishPrepared() but in ROLLBACK PREPARED case, the > caller specified it at the wrong position: > > @@ -730,6 +730,7 @@ DecodeCommit(LogicalDecodingContext *ctx, > XLogRecordBuffer *buf, > if (two_phase) > { > ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, > buf->endptr, > + > SnapBuildInitialConsistentPoint(ctx->snapshot_builder), > commit_time, origin_id, origin_lsn, > parsed->twophase_gid, true); > } > @@ -868,6 +869,7 @@ DecodeAbort(LogicalDecodingContext *ctx, > XLogRecordBuffer *buf, > { > ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr, > buf->endptr, > abort_time, origin_id, origin_lsn, > + InvalidXLogRecPtr, > parsed->twophase_gid, false); > } > > This affects the value of rollback_data.rollback_time on the > subscriber, resulting in setting the wrong timestamp to both the > replication origin timestamp and the error callback argument on the > subscriber. I've attached the patch to fix it. > > Besides, I think we can improve checking input data on subscribers. > This bug was not detected by compilers but it could have been detected > if we checked the input data. Looking at logicalrep_read_xxx functions > in proto.c, there are some inconsistencies; we check the value of > prepare_data->xid in logicalrep_read_prepare_common() but we don't in > both logicalrep_read_commit_prepared() and > logicalrep_read_rollback_prepared(), and we don't check anything in > stream_start/stream_stop. Also, IIUC that since timestamps are always > set in prepare/commit prepared/rollback prepared cases we can check > them too. I've attached a PoC patch that introduces macros for > checking input data and adds some new checks. Since it could be > frequently called, I used unlikely() but probably we can also consider > replacing elog(ERROR) with assertions.
The first patch is required as suggested and fixes the problem. Few comments on the second patch: 1) Should we validate prepare_time and xid in logicalrep_read_begin_prepare. Is this not checked intentionally? 2) Similarly should we validate committime in logicalrep_read_stream_commit? Regards, Vignesh