Hi, On 2021-02-10 08:02:17 +0530, Amit Kapila wrote: > On Wed, Feb 10, 2021 at 12:08 AM Robert Haas <robertmh...@gmail.com> wrote: > > > > On Tue, Feb 9, 2021 at 6:57 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > I think similar happens without any of the work done in PG-14 as well > > > if we restart the apply worker before the commit completes on the > > > subscriber. After the restart, we will send the start_decoding_at > > > point based on some previous commit which will make publisher send the > > > entire transaction again. I don't think restart of WAL sender or WAL > > > receiver is such a common thing. It can only happen due to some bug in > > > code or user wishes to stop the nodes or some crash happened. > > > > Really? My impression is that the logical replication protocol is > > supposed to be designed in such a way that once a transaction is > > successfully confirmed, it won't be sent again. Now if something is > > not confirmed then it has to be sent again. But if it is confirmed > > then it shouldn't happen.
Correct. > If by successfully confirmed, you mean that once the subscriber node > has received, it won't be sent again then as far as I know that is not > true. We rely on the flush location sent by the subscriber to advance > the decoding locations. We update the flush locations after we apply > the transaction's commit successfully. Also, after the restart, we use > the replication origin's last flush location as a point from where we > need the transactions and the origin's progress is updated at commit > time. That's not quite right. Yes, the flush location isn't guaranteed to be updated at that point, but a replication client will send the last location they've received and successfully processed, and that has to *guarantee* that they won't receive anything twice, or miss something. Otherwise you've broken the protocol. Greetings, Andres Freund