On Fri, Jul 16, 2021 at 1:07 PM Drouvot, Bertrand <bdrou...@amazon.com> wrote:
> Hi Andres, > > On 6/22/21 12:38 PM, Drouvot, Bertrand wrote: > > Hi Andres, > > > > On 6/14/21 7:41 AM, Drouvot, Bertrand wrote: > >> Hi Andres, > >> > >> On 4/8/21 5:47 AM, Andres Freund wrote: > >>> Hi, > >>> > >>> On 2021-04-07 13:32:18 -0700, Andres Freund wrote: > >>>> While working on this I found a, somewhat substantial, issue: > >>>> > >>>> When the primary is idle, on the standby logical decoding via > >>>> walsender > >>>> will typically not process the records until further WAL writes > >>>> come in > >>>> from the primary, or until a 10s lapsed. > >>>> > >>>> The problem is that WalSndWaitForWal() waits for the *replay* LSN to > >>>> increase, but gets woken up by walreceiver when new WAL has been > >>>> flushed. Which means that typically walsenders will get woken up at > >>>> the > >>>> same time that the startup process will be - which means that by the > >>>> time the logical walsender checks GetXLogReplayRecPtr() it's unlikely > >>>> that the startup process already replayed the record and updated > >>>> XLogCtl->lastReplayedEndRecPtr. > >>>> > >>>> I think fixing this would require too invasive changes at this > >>>> point. I > >>>> think we might be able to live with 10s delay issue for one > >>>> release, but > >>>> it sure is ugly :(. > >>> This is indeed pretty painful. It's a lot more regularly occuring if > >>> you > >>> either have a slot disk, or you switch around the order of > >>> WakeupRecovery() and WalSndWakeup() XLogWalRcvFlush(). > >>> > >>> - There's about which timeline to use. If you use pg_recvlogical and > >>> you > >>> restart the server, you'll see errors like: > >>> > >>> pg_recvlogical: error: unexpected termination of replication > >>> stream: ERROR: requested WAL segment 000000000000000000000003 has > >>> already been removed > >>> > >>> the real filename is 000000010000000000000003 - i.e. the timeline is > >>> 0. > >>> > >>> This isn't too hard to fix, but definitely needs fixing. > >> > >> Thanks, nice catch! > >> > >> From what I have seen, we are not going through InitXLOGAccess() on a > >> Standby and in some cases (like the one you mentioned) > >> StartLogicalReplication() is called without IdentifySystem() being > >> called previously: this lead to ThisTimeLineID still set to 0. > >> > >> I am proposing a fix in the attached v18 by adding a check in > >> StartLogicalReplication() and ensuring that ThisTimeLineID is retrieved. > >> > >>> > >>> - ResolveRecoveryConflictWithLogicalSlots() is racy - potentially > >>> leading us to drop a slot that has been created since we signalled a > >>> recovery conflict. See > >>> > https://www.postgresql.org/message-id/20210408020913.zzprrlvqyvlt5cyy%40alap3.anarazel.de > >>> > >>> for some very similar issues. > >> > >> I have rewritten this part by following the same logic as the one > >> used in 96540f80f8 (the commit linked to the thread you mentioned). > >> > >>> > >>> - Given the precedent of max_slot_wal_keep_size, I think it's wrong to > >>> just drop the logical slots. Instead we should just mark them as > >>> invalid, like InvalidateObsoleteReplicationSlots(). > >> > >> Makes fully sense and done that way in the attached patch. > >> > >> I am setting the slot's data.xmin and data.catalog_xmin as > >> InvalidTransactionId to mark the slot(s) as invalid in case of conflict. > >> > >>> - There's no tests covering timeline switches, what happens if > >>> there's a > >>> promotion if logical decoding is currently ongoing. > >> > >> I'll now work on the tests. > >> > >>> > >>> - The way ResolveRecoveryConflictWithLogicalSlots() builds the error > >>> message is not good (and I've complained about it before...). > >> > >> I changed it and made it more simple. > >> > >> I also removed the details around mentioning xmin or catalog xmin (as > >> I am not sure of the added value and they are currently also not > >> mentioned during standby recovery snapshot conflict). > >> > >>> > >>> Unfortunately I think the things I have found are too many for me to > >>> address within the given time. I'll send a version with a somewhat > >>> polished set of the changes I made in the next few days... > >> > >> Thanks for the review and feedback. > >> > >> Please find enclosed v18 with the changes I worked on. > >> > >> I still need to have a look on the tests. > > > > Please find enclosed v19 that also contains the changes related to > > your TAP tests remarks, mainly: > > > > - get rid of 024 and add more tests in 026 (025 has been used in the > > meantime) > > > > - test that logical decoding actually produces useful and correct results > > > > - test standby promotion and logical decoding behavior once done > > > > - useless "use" removal > > > > - check_confl_logicalslot() function removal > > > > - rewrote make_slot_active() to make use of poll_query_until() and > > timeout > > > > - remove the useless eval() > > > > - remove the "Catalog xmins should advance after standby logical slot > > fetches the changes" test > > > > One thing that's not clear to me is your remark "There's also no test > > for a recovery conflict due to row removal": Don't you think that the > > "vacuum full" conflict test is enough? if not, what kind of additional > > tests would you like to see? > > > >> > >> There is also the 10s delay to work on, do you already have an idea > >> on how we should handle it? > >> > >> Thanks > >> > >> Bertrand > >> > > Thanks > > > > Bertrand > > > Please find enclosed v20 a needed rebase (nothing serious worth > mentioning) of v19. > > FWIW, just to sum up that v19 (and so v20): > > - contained the changes (see details above) related to your TAP tests > remarks > > - contained the changes (see details above) related to your code remarks > > There is still the 10s delay thing that need work: do you already have > an idea on how we should handle it? > > And still one thing that's not clear to me is your remark "There's also > no test for a recovery conflict due to row removal": Don't you think > that the "vacuum full" conflict test is enough? if not, what kind of > additional tests would you like to see? > > Thanks > > Bertrand > > > > > The patch does not apply and an updated patch is required. patching file src/include/replication/slot.h Hunk #1 FAILED at 214. 1 out of 2 hunks FAILED -- saving rejects to file src/include/replication/slot.h.rej -- Ibrar Ahmed