On Mon, Feb 22, 2021 at 9:39 AM Andres Freund <and...@anarazel.de> wrote:
>
> Hi,
>
> On 2021-02-22 08:22:35 +0530, Amit Kapila wrote:
> > On Mon, Feb 22, 2021 at 3:56 AM Andres Freund <and...@anarazel.de> wrote:
> > >
> > > On 2021-02-21 11:32:29 +0530, Amit Kapila wrote:
> > > > Here, I am assuming you are asking to disable 2PC both via
> > > > apply-worker and tablesync worker till the initial sync (aka all
> > > > tables are in SUBREL_STATE_READY state) phase is complete. If we do
> > > > that and what if commit prepared happened after the initial sync phase
> > > > but prepare happened before that?
> > >
> > > Isn't that pretty easy to detect? You compare the LSN of the tx prepare
> > > with the LSN of achieving consistency?
> > >
> >
> > I think by LSN of achieving consistency, you mean start_decoding_at
> > LSN.
>
> Kinda, but not in the way you suggest. I mean the LSN at which the slot
> reached SNAPBUILD_CONSISTENT. Which also is the point in the WAL stream
> we exported the initial snapshot for.
>

Okay, that's an interesting idea. I have few questions on this, see below.

> My understanding of why you need to have special handling of 2pc PREPARE
> is that the initial snapshot will not contain the contents of the
> prepared transaction, therefore you need to send it out at some point
> (or be incorrect).
>
> Your solution to this is:
>         /*
>          * It is possible that this transaction is not decoded at prepare time
>          * either because by that time we didn't have a consistent snapshot 
> or it
>          * was decoded earlier but we have restarted. We can't distinguish 
> between
>          * those two cases so we send the prepare in both the cases and let
>          * downstream decide whether to process or skip it. We don't need to
>          * decode the xact for aborts if it is not done already.
>          */
>         if (!rbtxn_prepared(txn) && is_commit)
>
> but IMO this violates a pretty fundamental tenant of how logical
> decoding is supposed to work, i.e. that data that the client
> acknowledges as having received (via lsn passed to START_REPLICATION)
> shouldn't be sent out again.
>

I agree that this is not acceptable that is why trying to explore
other solutions including what you have proposed.

> What I am proposing is to instead track the point at which the slot
> gained consistency - a simple LSN. That way you can change the above
> logic to instead be
>
> if (txn->final_lsn > snapshot_was_exported_at_lsn)
>    ReorderBufferReplay();
> else
>    ...
>

With this if the prepare is prior to consistent_snapshot
(snapshot_was_exported_at_lsn)) and commit prepared is after then we
won't send the prepare and data. Won't we need to send such prepares?
If the condition is other way (if (txn->final_lsn <
snapshot_was_exported_at_lsn)) then we would send such prepares?

Just to clarify, after the initial copy, say when we start/restart the
streaming and we picked the serialized snapshot of some other
WALSender, we don't need to use snapshot_was_exported_at_lsn
corresponding to the serialized snapshot of some other slot?

I am not sure for the matter of this problem enabling 2PC during
initial sync (initial snapshot + copy) matters. Because, if we follow
the above, then it should be fine even if 2PC is enabled?

> That will easily work across restarts, won't lead to sending data twice,
> etc.
>

Yeah, we need to probably store this new point as slot's persistent information.

-- 
With Regards,
Amit Kapila.


Reply via email to