On 20.02.21 13:15, Amit Kapila wrote:
I think after the patch Ajin proposed decoders won't need any special
checks after receiving the prepared xacts. What additional simplicity
this approach will bring?

The API becomes clearer in that all PREPAREs are always decoded in WAL stream order and are not ever deferred (possibly until after the commits of many other transactions). No output plugin will need to check against this peculiarity, but can rely on WAL ordering of events.

(And if an output plugin does not want prepares to be individual events, it should simply not enable two-phase support. That seems like something the output plugin could even do on a per-transaction basis.)

Do you mean to say that after creating the slot we take an additional
pass over WAL (till the LSN where we found a consistent snapshot) to
collect all prepared transactions and wait for them to get
committed/rollbacked?

No. A single pass is enough, the decoder won't need any further change beyond the code removal in my patch.

I'm proposing for the synchronization logic (in e.g. pgoutput) to defer the snapshot taking. So that there's some time in between creating the logical slot (at step 1.) and taking a snapshot (at step 4.). Another CATCHUP phase, if you want.

So that all two-phase commit transactions are delivered via either:

* the transferred snapshot (because their COMMIT PREPARED took place
  before the snapshot was taken in (4)), or

* the decoder stream (because their PREPARE took place after the slot
  was fully created and snapbuilder reached a consistent snapshot)

No transaction can have PREPAREd before (1) but not committed until after (4), because we waited for all prepared transactions to commit in step (3).

I think the scheme proposed by you is still not fully clear to me but
can you please explain how in the existing proposed patch there is a
danger of showing transactions as committed without the effects of the
PREPAREs being "visible"?

Please see the `twophase_snapshot` isolation test. The expected output there shows the insert from s1 being committed prior to the prepare of the transaction in s2.

On a replica applying the stream in that order, a transaction in between these two events would see the results from s1 while still being allowed to lock the row that s2 is about to update. Something I'd expect the PREPARE to prevent.

That is (IMO) wrong in `master` and Ajin's patch doesn't correct it. (While my patch does, so don't look at my patch for this example.)

* Second, it becomes possible to avoid inconsistencies during the
    reconciliation window in between steps 5 and 6 by disallowing
    concurrent (user) transactions to run until after completion of
    step 6.

This second point sounds like a restriction that users might not like.

"It becomes possible" cannot be a restriction. If a user (or replication solution) wants to allow for these inconsistencies, it still can. I want to make sure that solutions which *want* to prevent inconsistencies can be implemented.

Your concern applies to step (3), though. The current approach is clearly quicker to restore the backup and start to apply transactions. Until you start to think about reordering the "early" commits until after the deferred PREPAREs in the output plugin or on the replica side, so as to lock rows by prepared transactions before making other commits visible so as to prevent inconsistencies...

But we need something in existing logic in WALSender or somewhere to
allow supporting 2PC for subscriptions and from your above
description, it is not clear to me how we can achieve that?

I agree that some more code is required somewhere, outside of the walsender.

Regards

Markus


Reply via email to