Re: very high replay_lag on 3-node cluster

Tiemen Ruiten Mon, 22 Jul 2019 03:59:29 -0700

On Mon, Jul 22, 2019 at 11:28 AM Jehan-Guillaume (ioguix) de Rorthais <
iog...@free.fr> wrote:

> Hi,
>
> On Mon, 22 Jul 2019 11:05:57 +0200
> Tiemen Ruiten <t.rui...@tech-lab.io> wrote:
> [...]
> > > Now to my current issue: I took the advice to add more monitoring on
> > > replay lag (using pg_last_xact_replay_timestamp) and things are not
> looking
> > > good. Last night replication lagged by almost 6 hours on one of the
> > > nodes[3], but eventually caught up. As you can see in that screenshot,
> > > ph-sql-03 is consistently slower to replay than ph-sql-05 (ph-sql-04
> is the
> > > current master) and there happen to be different SSD's in ph-sql-03
> > > (Crucial MX300 vs Crucial MX500 in the other two), which makes me think
> > > this is IO related.
>
> Such a difference is quite surprising. Moreover, I suppose you have some
> caching in front of disks (either RAID or SAN?). Could you describe your
> disk
> stack with more details?
>
> Do you have any detailed metrics about disks and network IO to share?
>
> The network is the same for both nodes?
>

Hi!

All nodes have only local storage. The filesystem is ZFS, which has its own
builtin caching (ARC/L2ARC & ZIL). There are no hardware caching devices
present. The disks are connected using a Supermicro AOC-S3008L-L8E HBA
(shown as LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3
(rev 02) with lspci). This is the layout of the pool:

[root@ph-sql-04 ter]# zpool status
  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:07:51 with 0 errors on Tue Jul 16
03:17:52 2019
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda     ONLINE       0     0     0
            sdb     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            sdg     ONLINE       0     0     0
            sdh     ONLINE       0     0     0

I've attached a graph of network IO on all servers. The network config is
identical for all three nodes: 2x bonded gigabit connection to the same
stacked switch pair.

Currently I don't have much metrics on raw disk IO, I can add some
monitoring items and report those tomorrow if that would be useful. I do
have a lot of metrics available related to PostgreSQL (buffers
written/read, checkpoints, rows fetched/updated/inserted etc.)

Re: very high replay_lag on 3-node cluster

Reply via email to