At Mon, 23 Aug 2021 18:52:17 -0400, Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote in > Included 蔡梦娟 and Jakub Wartak because they've expressed interest on > this topic -- notably [2] ("Bug on update timing of walrcv->flushedUpto > variable"). > > As mentioned in the course of thread [1], we're missing a fix for > streaming replication to avoid sending records that the primary hasn't > fully flushed yet. This patch is a first attempt at fixing that problem > by retreating the LSN reported as FlushPtr whenever a segment is > registered, based on the understanding that if no registration exists > then the LogwrtResult.Flush pointer can be taken at face value; but if a > registration exists, then we have to stream only till the start LSN of > that registered entry. > > This patch is probably incomplete. First, I'm not sure that logical > replication is affected by this problem. I think it isn't, because > logical replication will halt until the record can be read completely -- > maybe I'm wrong and there is a way for things to go wrong with logical > replication as well. But also, I need to look at the other uses of > GetFlushRecPtr() and see if those need to change to the new function too > or they can remain what they are now. > > I'd also like to have tests. That seems moderately hard, but if we had > WAL-molasses that could be used in walreceiver, it could be done. (It > sounds easier to write tests with a molasses-archive_command.) > > > [1] https://postgr.es/m/cbddfa01-6e40-46bb-9f98-9340f4379...@amazon.com > [2] > https://postgr.es/m/3f9c466d-d143-472c-a961-66406172af96.mengjuan....@alibaba-inc.com
(I'm not sure what "WAL-molasses" above expresses, same as "sugar"?) For our information, this issue is related to the commit 0668719801 which makes XLogPageRead restart reading a (continued or segments-spanning) record with switching sources. In that thread, I modifed the code to cause a server crash under the desired situation.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center