On Wed, Jul 17, 2019 at 1:52 PM Michael Paquier <mich...@paquier.xyz> wrote: > I got surprised by the following behavior from pg_stat_get_wal_senders > when connecting for example pg_receivewal to a primary: > =# select application_name, flush_lsn, replay_lsn, flush_lag, > replay_lag from pg_stat_replication; > application_name | flush_lsn | replay_lsn | flush_lag | replay_lag > ------------------+-----------+------------+-----------------+----------------- > receivewal | null | null | 00:09:13.578185 | 00:09:13.578185 > (1 row) > > It makes little sense to me, as we are reporting a replay lag on a > position which has never been reported yet, so it cannot actually be > used as a comparison base for the lag. Am I missing something or > should we return NULL for those fields if we have no write, flush or > apply LSNs like in the attached?
Hmm. It's working as designed, but indeed it's not very newsworthy information in this case. If you run pg_receivewal --synchronous then you get sensible looking flush_lag times. Without that, flush_lag only goes up, and of course replay_lag only goes up, so although it's telling the truth, I think your proposal makes sense. One question I had is what would happen with your patch without --synchronous, once it flushes a whole file and opens a new one; I wondered if your new boring-information-hiding behaviour would stop working after one segment file because of that. I tested that and the column remains NULL when we move to a new file, so that's good. One thing I noticed in passing is that you always get the same times in the write_lag and flush_lag columns, in --synchronous mode, and the times updates infrequently. That's not the case with regular replicas; I suspect there is a difference in the time and frequency of replies sent to the server, which I guess might make synchronous commit a bit "lumpier", but I didn't dig further today. -- Thomas Munro https://enterprisedb.com