On Mon, Feb 22, 2021 at 2:57 PM Andres Freund <and...@anarazel.de> wrote: > > > Yeah, we need to probably store this new point as slot's persistent > > information. > > Should be fine I think... >
So, we are in agreement that the above solution will work and we won't need to resend the prepare after the restart. I would like to once again describe few other points which we are discussing in this and other thread [1] to see if you or others have any different opinion on those: 1. With respect to SQL APIs, currently 'two-phase-commit' is a plugin option so it is possible that the first time when it gets changes (pg_logical_slot_get_changes) *without* 2PC enabled it will not get the prepared even though prepare is after consistent snapshot. Now next time during getting changes (pg_logical_slot_get_changes) if the 2PC option is enabled it will skip prepare because by that time start_decoding_at has been moved. So the user will only get commit prepared as shown in the example in the email above [2]. I think it might be better to allow enable/disable of 2PC only at create_slot time. Markus, Ajin, and I seem to be in agreement on this point. I think the same will be true for subscriber-side solution as well. 2. There is a possibility that subscribers miss some prepared xacts. Let me explain the problem and solution. Currently, when we create a subscription, we first launch apply-worker and create the main apply worker slot and then launch table sync workers as required. Now, assume, the apply worker slot is created and after that, we launch tablesync worker, which will initiate its slot (sync_slot) creation. Then, on the publisher-side, the situation is such that there is a prepared transaction that happens before we reach a consistent snapshot for sync_slot. Because the WALSender corresponding to apply worker is already running so it will be in consistent state, for it, such a prepared xact can be decoded and it will send the same to the subscriber. On the subscriber-side, it can skip applying the data-modification operations because the corresponding rel is still not in a ready state (see should_apply_changes_for_rel and its callers) simply because the corresponding table sync worker is not finished yet. But prepare will occur and it will lead to a prepared transaction on the subscriber. In this situation, tablesync worker has skipped prepare because the snapshot was not consistent and then it exited because it is in sync with the apply worker. And apply worker has skipped because tablesync was in-progress. Later when Commit prepared will come, the apply-worker will simply commit the previously prepared transaction and we will never see the prepared transaction data. For example, consider below situation: LSN of Prepare t1 = 490, tablesync skipped because it was prior to a consistent point LSN of Commit t2 = 500 LSN of commit t3 = 510 LSN of Commit Prepared t1 = 520. Tablesync worker initially (via copy) got till xact t3 (LSN = 510). For the apply worker, we get all the above LSN's as it is started before tablesync worker and reached a consistent point before it. In the above example, there is a possibility that we miss applying data for xact t1 as explained in previous paragraphs. So, the basic premise is that we can't allow tablesync workers to skip prepared transactions (which can be processed by apply worker) and process later commits. I have one idea to address this. When we get the first begin (for prepared xact) in the apply-worker, we can check if there are any relations in "not_ready" state and if so then just wait till all the relations become in sync with the apply worker. This is to avoid that any of the tablesync workers might skip prepared xact and we don't want apply worker to also skip the same. Now, it is possible that some tablesync worker has copied the data and moved the sync position ahead of where the current apply worker's position is. In such a case, we need to process transactions in apply worker such that we can process commits if any, and write prepared transactions to file. For prepared transactions, we can take decisions only once the commit prepared for them has arrived. The other idea I have thought of for this is to only enable 2PC after initial sync (when both apply worker and tablesync workers are in sync) is over but I think that can lead to the problem described in point 1. [1] - https://www.postgresql.org/message-id/CAA4eK1L%3DdhuCRvyDvrXX5wZgc7s1hLRD29CKCK6oaHtVCPgiFA%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAFPTHDbbth0XVwf%3DWXcmp%3D_2nU5oNaK4CxetUr22qi1UM5v6rw%40mail.gmail.com -- With Regards, Amit Kapila.