On Tue, Oct 3, 2023 at 9:58 AM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > On Fri, Sep 29, 2023 at 5:27 PM Hayato Kuroda (Fujitsu) > <kuroda.hay...@fujitsu.com> wrote: > > > > Yeah, the approach enforces developers to check the decodability. > > But the benefit seems smaller than required efforts for it because the > > function > > would be used only by pg_upgrade. Could you tell me if you have another use > > case > > in mind? We may able to adopt if we have... > > I'm attaching 0002 patch (on top of v45) which implements the new > decodable callback approach that I have in mind. IMO, this new > approach is extensible, better than the current approach (hard-coding > of certain WAL records that may be generated during pg_upgrade) taken > by the patch, and helps deal with the issue that custom WAL resource > managers can have with the current approach taken by the patch. >
Today, I discussed this problem with Andres at PGConf NYC and he suggested as following. To verify, if there is any pending unexpected WAL after shutdown, we can have an API like pg_logical_replication_slot_advance() which will simply process records without actually sending anything downstream. In this new API, we will start with each slot's restart_lsn location and try to process till the end of WAL, if we encounter any WAL that needs to be processed (like we need to send the decoded WAL downstream) we can return a false indicating that there is an unexpected WAL. The reason to start with restart_lsn is that it is the location that we use to start scanning the WAL anyway. Then, we should also try to create slots before invoking pg_resetwal. The idea is that we can write a new binary mode function that will do exactly what pg_resetwal does to compute the next segment and use that location as a new location (restart_lsn) to create the slots in a new node. Then, pass it pg_resetwal by using the existing option '-l walfile'. As we don't have any API that takes restart_lsn as input, we can write a new API probably for binary mode to create slots that do take restart_lsn as input. This will ensure that there is no new WAL inserted by background processes between resetwal and the creation of slots. The other potential problem Andres pointed out is that during shutdown if due to some reason, the walreceiver goes down, we won't be able to send the required WAL and users won't be able to ensure that because even after restart the same situation can happen. The ideal way is to have something that puts the system in READ ONLY state during shutdown and then we can probably allow walreceivers to reconnect and receive the required WALs. As we don't have such functionality available and it won't be easy to achieve the same, we can leave this for now. Thoughts? -- With Regards, Amit Kapila.