Re: repeated decoding of prepared transactions

Amit Kapila Tue, 23 Feb 2021 01:54:20 -0800

On Mon, Feb 22, 2021 at 2:57 PM Andres Freund <and...@anarazel.de> wrote:
>
> > Yeah, we need to probably store this new point as slot's persistent 
> > information.
>
> Should be fine I think...
>

So, we are in agreement that the above solution will work and we won't
need to resend the prepare after the restart. I would like to once
again describe few other points which we are discussing in this and
other thread [1] to see if you or others have any different opinion on
those:

1. With respect to SQL APIs, currently 'two-phase-commit' is a plugin
option so it is possible that the first time when it gets changes
(pg_logical_slot_get_changes) *without* 2PC enabled it will not get
the prepared even though prepare is after consistent snapshot. Now
next time during getting changes (pg_logical_slot_get_changes) if the
2PC option is enabled it will skip prepare because by that time
start_decoding_at has been moved. So the user will only get commit
prepared as shown in the example in the email above [2]. I think it
might be better to allow enable/disable of 2PC only at create_slot
time. Markus, Ajin, and I seem to be in agreement on this point. I
think the same will be true for subscriber-side solution as well.

2. There is a possibility that subscribers miss some prepared xacts.
Let me explain the problem and solution. Currently, when we create a
subscription, we first launch apply-worker and create the main apply
worker slot and then launch table sync workers as required. Now,
assume, the apply worker slot is created and after that, we launch
tablesync worker, which will initiate its slot (sync_slot) creation.
Then, on the publisher-side, the situation is such that there is a
prepared transaction that happens before we reach a consistent
snapshot for sync_slot.

Because the WALSender corresponding to apply worker is already running
so it will be in consistent state, for it, such a prepared xact can be
decoded and it will send the same to the subscriber. On the
subscriber-side, it can skip applying the data-modification operations
because the corresponding rel is still not in a ready state (see
should_apply_changes_for_rel and its callers) simply because the
corresponding table sync worker is not finished yet. But prepare will
occur and it will lead to a prepared transaction on the subscriber.

In this situation, tablesync worker has skipped prepare because the
snapshot was not consistent and then it exited because it is in sync
with the apply worker. And apply worker has skipped because tablesync
was in-progress. Later when Commit prepared will come, the
apply-worker will simply commit the previously prepared transaction
and we will never see the prepared transaction data.

For example, consider below situation:
LSN of Prepare t1 = 490, tablesync skipped because it was prior to a
consistent point
LSN of Commit t2 = 500
LSN of commit t3 = 510
LSN of Commit Prepared t1 = 520.

Tablesync worker initially (via copy) got till xact t3 (LSN = 510).
For the apply worker, we get all the above LSN's as it is started
before tablesync worker and reached a consistent point before it. In
the above example, there is a possibility that we miss applying data
for xact t1 as explained in previous paragraphs.

So, the basic premise is that we can't allow tablesync workers to skip
prepared transactions (which can be processed by apply worker) and
process later commits.

I have one idea to address this. When we get the first begin (for
prepared xact) in the apply-worker, we can check if there are any
relations in "not_ready" state and if so then just wait till all the
relations become in sync with the apply worker. This is to avoid that
any of the tablesync workers might skip prepared xact and we don't
want apply worker to also skip the same.

Now, it is possible that some tablesync worker has copied the data and
moved the sync position ahead of where the current apply worker's
position is. In such a case, we need to process transactions in apply
worker such that we can process commits if any, and write prepared
transactions to file. For prepared transactions, we can take decisions
only once the commit prepared for them has arrived.

The other idea I have thought of for this is to only enable 2PC after
initial sync (when both apply worker and tablesync workers are in
sync) is over but I think that can lead to the problem described in
point 1.

[1] -
https://www.postgresql.org/message-id/CAA4eK1L%3DdhuCRvyDvrXX5wZgc7s1hLRD29CKCK6oaHtVCPgiFA%40mail.gmail.com
[2] -
https://www.postgresql.org/message-id/CAFPTHDbbth0XVwf%3DWXcmp%3D_2nU5oNaK4CxetUr22qi1UM5v6rw%40mail.gmail.com

--
With Regards,
Amit Kapila.

Re: repeated decoding of prepared transactions

Reply via email to