Re: repeated decoding of prepared transactions

Markus Wanner Mon, 22 Feb 2021 00:25:34 -0800

On 20.02.21 13:15, Amit Kapila wrote:

I think after the patch Ajin proposed decoders won't need any special
checks after receiving the prepared xacts. What additional simplicity
this approach will bring?

The API becomes clearer in that all PREPAREs are always decoded in WALstream order and are not ever deferred (possibly until after the commitsof many other transactions). No output plugin will need to checkagainst this peculiarity, but can rely on WAL ordering of events.

(And if an output plugin does not want prepares to be individual events,it should simply not enable two-phase support. That seems likesomething the output plugin could even do on a per-transaction basis.)

Do you mean to say that after creating the slot we take an additional
pass over WAL (till the LSN where we found a consistent snapshot) to
collect all prepared transactions and wait for them to get
committed/rollbacked?

No. A single pass is enough, the decoder won't need any further changebeyond the code removal in my patch.

I'm proposing for the synchronization logic (in e.g. pgoutput) to deferthe snapshot taking. So that there's some time in between creating thelogical slot (at step 1.) and taking a snapshot (at step 4.). AnotherCATCHUP phase, if you want.


So that all two-phase commit transactions are delivered via either:

* the transferred snapshot (because their COMMIT PREPARED took place
  before the snapshot was taken in (4)), or

* the decoder stream (because their PREPARE took place after the slot
  was fully created and snapbuilder reached a consistent snapshot)

No transaction can have PREPAREd before (1) but not committed untilafter (4), because we waited for all prepared transactions to commit instep (3).

I think the scheme proposed by you is still not fully clear to me but
can you please explain how in the existing proposed patch there is a
danger of showing transactions as committed without the effects of the
PREPAREs being "visible"?

Please see the `twophase_snapshot` isolation test. The expected outputthere shows the insert from s1 being committed prior to the prepare ofthe transaction in s2.

On a replica applying the stream in that order, a transaction in betweenthese two events would see the results from s1 while still being allowedto lock the row that s2 is about to update. Something I'd expect thePREPARE to prevent.

That is (IMO) wrong in `master` and Ajin's patch doesn't correct it.(While my patch does, so don't look at my patch for this example.)

* Second, it becomes possible to avoid inconsistencies during the
    reconciliation window in between steps 5 and 6 by disallowing
    concurrent (user) transactions to run until after completion of
    step 6.


This second point sounds like a restriction that users might not like.

"It becomes possible" cannot be a restriction. If a user (orreplication solution) wants to allow for these inconsistencies, it stillcan. I want to make sure that solutions which *want* to preventinconsistencies can be implemented.

Your concern applies to step (3), though. The current approach isclearly quicker to restore the backup and start to apply transactions.Until you start to think about reordering the "early" commits untilafter the deferred PREPAREs in the output plugin or on the replica side,so as to lock rows by prepared transactions before making other commitsvisible so as to prevent inconsistencies...

But we need something in existing logic in WALSender or somewhere to
allow supporting 2PC for subscriptions and from your above
description, it is not clear to me how we can achieve that?


I agree that some more code is required somewhere, outside of the walsender.

Regards

Markus

Re: repeated decoding of prepared transactions

Reply via email to