On Mon, Sep 25, 2023 at 2:06 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > [1] > > > https://www.postgresql.org/message-id/CAA4eK1%2BLtWDKXvxS7gnJ562VX%2Bs3C6%2B0uQWamqu%3DUuD8hMfORg%40mail.gmail.com > > > > I see. IIUC, without that commit e0b2eed [1], it may happen that the > > slot's on-disk confirmed_flush LSN value can be higher than the WAL > > LSN that's flushed to disk, no? > > > > No, without that commit, there is a very high possibility that even if > we have sent the WAL to the subscriber and got the acknowledgment of > the same, we would miss updating it before shutdown. This would lead > to upgrade failures because upgrades have no way to later identify > whether the remaining WAL records are sent to the subscriber.
Thanks for clarifying. I'm trying understand what happens without commit e0b2eed0 with an illustration: step 1: publisher - confirmed_flush LSN in replication slot on disk structure is 80 step 2: publisher - sends WAL at LSN 100 step 3: subscriber - acknowledges the apply LSN or confirmed_flush LSN as 100 step 4: publisher - shuts down without writing the new confirmed_flush LSN as 100 to disk, note that commit e0b2eed0 is not in place step 5: publisher - restarts step 6: subscriber - upon publisher restart, the subscriber requests WAL from publisher from LSN 100 as it tracks the last applied LSN in replication origin Now, if the pg_upgrade with the patch in this thread is run on publisher after step 4, it complains with "The slot \"%s\" has not consumed the WAL yet". Is my above understanding right? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com