Hello Laurenz, The above metric works fine for my primary server. However, We have a cascading setup in our production system. This particular query doesn't work for my intermediate server which is standby to the primary server but also a master to one more standby server. We get the following error:
prod=# SELECT round(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) / (1024 :: NUMERIC ^ 2), 2 ) AS replay_lag FROM pg_stat_replication where application_name like 'rtv%'; ERROR: recovery is in progress HINT: WAL control functions cannot be executed during recovery. I wanted to also track the transfer/replay lag between the intermediate server and the final standby. I narrowed down the problem with pg_current_wal_lsn() procedure. Any idea on how to mitigate this? or any alternative to the pg_current_wal_lsn() procedure that I can use to get transfer lag? Thanks, Viral Shah On Fri, Apr 16, 2021 at 7:52 PM Viral Shah <vs...@nodalexchange.com> wrote: > Hello Laurenz, > > Thank you so much for sending the query. It was exactly what I needed. I > just made 1 modification to beautify the transfer and replay lag and I can > see the size in bytes. > > SELECT application_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), > flush_lsn)) AS transfer_lag, > pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) AS > replay_lag > FROM pg_stat_replication; > > I am now using zabbix to constantly monitor them and notify myself if it > breaches a certain threshold. > > > Thanks again! > > > Best, > > Viral Shah > > Nodal Exchange LLC > > > > On Thu, Apr 15, 2021 at 8:10 AM Laurenz Albe <laurenz.a...@cybertec.at> > wrote: > >> On Wed, 2021-04-14 at 17:50 -0400, Viral Shah wrote: >> > We have a PostgreSQL 10.12 cluster of servers in two different data >> centers. >> > Off lately, in the case of a large WAL generation, we are seeing >> replication >> > delay between the master and the standby server. These delays have off >> lately >> > been there for an unusually long time. I was wondering if we have any >> metric >> > that can calculate the amount (size) of WAL transfer left between >> master and >> > standby? >> > >> > PS: We have ensured we have upgraded our firewalls for better speed >> transfer. >> > >> > Any help on how to figure out the slowness in the WAL transfer would be >> much appreciated. >> >> SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn) AS transfer_lag, >> pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replay_lag >> FROM pg_stat_replication; >> >> If both are delayed, it might be that the network cannot cope. >> >> If only the second number is delayed, you have replication conflicts >> with queries on the standby. >> >> Yours, >> Laurenz Albe >> -- >> Cybertec | https://www.cybertec-postgresql.com >> >>