On Mon, Feb 22, 2021 at 9:39 AM Andres Freund <and...@anarazel.de> wrote: > > Hi, > > On 2021-02-22 08:22:35 +0530, Amit Kapila wrote: > > On Mon, Feb 22, 2021 at 3:56 AM Andres Freund <and...@anarazel.de> wrote: > > > > > > On 2021-02-21 11:32:29 +0530, Amit Kapila wrote: > > > > Here, I am assuming you are asking to disable 2PC both via > > > > apply-worker and tablesync worker till the initial sync (aka all > > > > tables are in SUBREL_STATE_READY state) phase is complete. If we do > > > > that and what if commit prepared happened after the initial sync phase > > > > but prepare happened before that? > > > > > > Isn't that pretty easy to detect? You compare the LSN of the tx prepare > > > with the LSN of achieving consistency? > > > > > > > I think by LSN of achieving consistency, you mean start_decoding_at > > LSN. > > Kinda, but not in the way you suggest. I mean the LSN at which the slot > reached SNAPBUILD_CONSISTENT. Which also is the point in the WAL stream > we exported the initial snapshot for. >
Okay, that's an interesting idea. I have few questions on this, see below. > My understanding of why you need to have special handling of 2pc PREPARE > is that the initial snapshot will not contain the contents of the > prepared transaction, therefore you need to send it out at some point > (or be incorrect). > > Your solution to this is: > /* > * It is possible that this transaction is not decoded at prepare time > * either because by that time we didn't have a consistent snapshot > or it > * was decoded earlier but we have restarted. We can't distinguish > between > * those two cases so we send the prepare in both the cases and let > * downstream decide whether to process or skip it. We don't need to > * decode the xact for aborts if it is not done already. > */ > if (!rbtxn_prepared(txn) && is_commit) > > but IMO this violates a pretty fundamental tenant of how logical > decoding is supposed to work, i.e. that data that the client > acknowledges as having received (via lsn passed to START_REPLICATION) > shouldn't be sent out again. > I agree that this is not acceptable that is why trying to explore other solutions including what you have proposed. > What I am proposing is to instead track the point at which the slot > gained consistency - a simple LSN. That way you can change the above > logic to instead be > > if (txn->final_lsn > snapshot_was_exported_at_lsn) > ReorderBufferReplay(); > else > ... > With this if the prepare is prior to consistent_snapshot (snapshot_was_exported_at_lsn)) and commit prepared is after then we won't send the prepare and data. Won't we need to send such prepares? If the condition is other way (if (txn->final_lsn < snapshot_was_exported_at_lsn)) then we would send such prepares? Just to clarify, after the initial copy, say when we start/restart the streaming and we picked the serialized snapshot of some other WALSender, we don't need to use snapshot_was_exported_at_lsn corresponding to the serialized snapshot of some other slot? I am not sure for the matter of this problem enabling 2PC during initial sync (initial snapshot + copy) matters. Because, if we follow the above, then it should be fine even if 2PC is enabled? > That will easily work across restarts, won't lead to sending data twice, > etc. > Yeah, we need to probably store this new point as slot's persistent information. -- With Regards, Amit Kapila.