Hello Laurenz,

The above metric works fine for my primary server. However, We have a
cascading setup in our production system. This particular query doesn't
work for my intermediate server which is standby to the primary server but
also a master to one more standby server. We get the following error:

prod=# SELECT round(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)
/ (1024 :: NUMERIC ^ 2), 2 ) AS replay_lag FROM pg_stat_replication
where application_name like 'rtv%';
ERROR:  recovery is in progress
HINT:  WAL control functions cannot be executed during recovery.


I wanted to also track the transfer/replay lag between the intermediate
server and the final standby. I narrowed down the problem with
pg_current_wal_lsn() procedure. Any idea on how to mitigate this? or any
alternative to the pg_current_wal_lsn() procedure that I can use to get
transfer lag?

Thanks,
Viral Shah


On Fri, Apr 16, 2021 at 7:52 PM Viral Shah <vs...@nodalexchange.com> wrote:

> Hello Laurenz,
>
> Thank you so much for sending the query. It was exactly what I needed. I
> just made 1 modification to beautify the transfer and replay lag and I can
> see the size in bytes.
>
> SELECT application_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), 
> flush_lsn)) AS transfer_lag,
>    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) AS 
> replay_lag
> FROM pg_stat_replication;
>
> I am now using zabbix to constantly monitor them and notify myself if it 
> breaches a certain threshold.
>
>
> Thanks again!
>
>
> Best,
>
> Viral Shah
>
> Nodal Exchange LLC
>
>
>
> On Thu, Apr 15, 2021 at 8:10 AM Laurenz Albe <laurenz.a...@cybertec.at>
> wrote:
>
>> On Wed, 2021-04-14 at 17:50 -0400, Viral Shah wrote:
>> > We have a PostgreSQL 10.12 cluster of servers in two different data
>> centers.
>> >  Off lately, in the case of a large WAL generation, we are seeing
>> replication
>> >  delay between the master and the standby server. These delays have off
>> lately
>> >  been there for an unusually long time. I was wondering if we have any
>> metric
>> >  that can calculate the amount (size) of WAL transfer left between
>> master and
>> >  standby?
>> >
>> > PS: We have ensured we have upgraded our firewalls for better speed
>> transfer.
>> >
>> > Any help on how to figure out the slowness in the WAL transfer would be
>> much appreciated.
>>
>> SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn) AS transfer_lag,
>>        pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replay_lag
>> FROM pg_stat_replication;
>>
>> If both are delayed, it might be that the network cannot cope.
>>
>> If only the second number is delayed, you have replication conflicts
>> with queries on the standby.
>>
>> Yours,
>> Laurenz Albe
>> --
>> Cybertec | https://www.cybertec-postgresql.com
>>
>>

Reply via email to