On Fri, Mar 12, 2021 at 2:09 PM Peter Smith <smithpb2...@gmail.com> wrote: > > Please find attached the latest patch set v58* >
In this patch-series, I see a problem with synchronous replication when GUC 'synchronous_standby_names' is configured to use subscriber. This will allow Prepares and Commits to wait for the subscriber to finish. Before this patch, we never send prepare as two-phase was not enabled by a subscriber, so it won't wait for it, rather it will make progress because we send keep_alive messages. But after this patch, it will start waiting for Prepare to finish. Now, without spool-file logic, it will work because prepares are decoded on subscriber and a corresponding ack will be sent to a publisher but for the spool-file case, we will wait for Publisher to send commit prepared and in publisher prepare is not finished because we are waiting for its ack. So, it will create a sort of deadlock. This is related to the problem as mentioned in the below comments in the patch: + * A future release may be able to detect when all tables are READY and set + * a flag to indicate this subscription/slot is ready for two_phase + * decoding. Then at the publisher-side, we could enable wait-for-prepares + * only when all the slots of WALSender have that flag set. The difference is that it can happen now itself, prepares automatically wait if 'synchronous_standby_names' is set. Now, we can imagine a solution where after spooling to file the changes which can't be applied during syncup phase, we update the flush location so that publisher can proceed with that prepare. But I think that won't work because once we have updated the flush location those prepares won't be sent again and it is quite possible that we don't have complete relation information as the schema is not sent with each transaction. Now, we can go one step further and try to remember the schema information the first time it is sent so that it can be reused after restart but I think that will complicate the patch and overall design. I think there is a simpler solution to these problems. The idea is to enable two_phase after the initial sync is over (all relations are in a READY state). If we switch-on the 2PC only after all the relations come to the READY state then we shouldn't get any prepare before sync-point. However, it is quite possible that before reaching syncpoint, the slot corresponding to apply-worker has skipped because 2PC was not enabled, and afterward, prepare would be skipped because by that start_decoding_at might have moved. See the explanation in an email: https://www.postgresql.org/message-id/CAA4eK1LuK4t-ZYYCY7k9nMoYP%2Bdwi-JyqUdtcffQMoB_g5k6Hw%40mail.gmail.com. Now, even the initial_consistent_point won't help because for apply-worker, it will be different from tablesync slot's initial_consistent_point and we would have reached initial consistency earlier for apply-workers. To solve the main problem (how to detect the prepares that are skipped when we toggled the two_pc option) in the above idea, we can mark an LSN position in the slot (two_phase_at, this will be the same as start_decoding_at point when we receive slot with 2PC option) where we enable two_pc. If we encounter any commit prepared whose prepare LSN is less than two_phase_at, then we need to send prepare for the transaction along with commit prepared. For this solution on the subscriber-side, I think we need a tri-state column (two_phase) in pg_subscription. It can have three values 'disable', 'can_enable', 'enable'. By default, it will be 'disable'. If the user enables 2PC, then we can set it to 'can_enable' and once we see all relations are in a READY state, restart the apply-worker and this time while starting the streaming, send the two_pc option and then we can change the state to 'enable' so that future restarts won't send this option again. Now on the publisher side, if this option is present, it will change the value of two_phase_at in the slot to start_decoding_at. I think something on these lines should be much easier than the spool-file implementation unless we see any problem with this idea. -- With Regards, Amit Kapila.