On Tue, Aug 13, 2019 at 2:20 PM Michael Paquier <mich...@paquier.xyz> wrote: > On Tue, Aug 13, 2019 at 11:15:42AM +1200, Thomas Munro wrote: > > One thing I noticed in passing is that you always get the same times > > in the write_lag and flush_lag columns, in --synchronous mode, and the > > times updates infrequently. That's not the case with regular > > replicas; I suspect there is a difference in the time and frequency of > > replies sent to the server, which I guess might make synchronous > > commit a bit "lumpier", but I didn't dig further today. > > The messages are sent by pg_receivewal via sendFeedback() in > receivelog.c. It gets triggered for the --synchronous case once a > flush is done (but you are not surprised by my reply here, right!), > and most likely the matches you are seeing some from the messages sent > at the beginning of HandleCopyStream() where the flush and write > LSNs are equal. This code behaves as I would expect based on your > description and a read of the code I have just done to refresh my > mind, but we may of course have some issues or potential > improvements.
Right. For a replica server we call XLogWalRcvSendReply() after writing, and then again inside XLogWalRcvFlush(). So the primary gets to measure write_lag and flush_lag separately. If pg_receivewal just sends one reply after flushing, then turning on --synchronous has the effect of showing the flush lag in both write_lag and flush_lag columns. Of course those things aren't quite as independent as they should be anyway, since the flush is blocking and therefore delays the next write. <mind-reading-mode>That's why Simon probably wants to move the flush to the WAL writer process, and Andres probably wants to change the whole thing to use some kind of async IO[1].</mind-reading-mode> [1] https://lwn.net/Articles/789024/ -- Thomas Munro https://enterprisedb.com