On Tue, Dec 1, 2020 at 12:46 AM Amit Kapila <amit.kapil...@gmail.com> wrote:
> So what caused it to skip due to start_decoding_at? Because the commit > where the snapshot became consistent is after Prepare. Does it happen > due to the below code in SnapBuildFindSnapshot() where we bump > start_decoding_at. > > { > ... > if (running->oldestRunningXid == running->nextXid) > { > if (builder->start_decoding_at == InvalidXLogRecPtr || > builder->start_decoding_at <= lsn) > /* can decode everything after this */ > builder->start_decoding_at = lsn + 1; I think the reason is that in the function DecodingContextFindStartpoint(), the code loops till it finds the consistent snapshot. Then once consistent snapshot is found, it sets slot->data.confirmed_flush = ctx->reader->EndRecPtr; This will be used as the start_decoding_at when the slot is restarted for decoding. > Sure, but you can see in your example above it got skipped due to > start_decoding_at not due to DecodingContextReady. So, the problem as > mentioned by me previously was how we distinguish those cases because > it can skip due to start_decoding_at during restart as well when we > would have already sent the prepare to the subscriber. The distinguishing factor is that at restart, the Prepare does satisfy DecodingContextReady (because the snapshot is consistent then). In both cases, the prepare is prior to start_decoding_at, but when the prepare is before a consistent point, it does not satisfy DecodingContextReady. Which is why I suggested using the check DecodingContextReady to mark the prepare as 'Not decoded". regards, Ajin Cherian Fujitsu Australia