To detect network issues maybe you could monitor replication delay. On Mon, May 13, 2019 at 6:42 AM <ayaho...@ibagroup.eu> wrote:
> Hello PostgreSQL Community! > > I faced an issue on my linux machine using Postgres 11.3 . > I have 2 nodes in db cluster: master and standby. > I tried to perform a plenty of long-running queries which lead to the > databases desynchronization: > terminating walsender process due to replication timeout > > Here is the output in debug mode: > 2019-05-13 13:21:33 FET 00000 DEBUG: sending replication keepalive > 2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed; > blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 > 2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed; > blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 > 2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed; > blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 > 2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed; > blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 > 2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed; > blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 > 2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed; > blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 > 2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed; > blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 > 2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed; > blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 > 2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed; > blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 > 2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed; > blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 > 2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed; > blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 > 2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed; > blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 > 2019-05-13 13:21:34 FET 00000 LOG: terminating walsender process due to > replication timeout > > > The issue is reproducible. I configure 2 nodes cluster, download > demo_small.zip from https://edu.postgrespro.ru/ and run the following > command: > psql -U user1 -f demo_small.sql db1 > and I get the observed behaviour. > > > I know that I can increase wal_sender_timeout value to avoid this > behaviour (currently wal_sender_timeout is equal to 1 second.) > To be honest I don't want to increase wal_sender_timeout because I would > like to detect some network issues quickly. > > After having googled I found that someone faced a similar issue > https://www.postgresql.org/message-id/e082a56a-fd95-a250-3bae-0fff93832...@2ndquadrant.com > which was fixed in PostgreSQL 9.4.16. > > > Is my issue the same as described here > https://www.postgresql.org/message-id/e082a56a-fd95-a250-3bae-0fff93832...@2ndquadrant.com > ? > Is there any other chance to avoid it without increasing > wal_sender_timeout? > > > Thank you in advance. > Regards, > Andrei -- El genio es 1% inspiraciĆ³n y 99% transpiraciĆ³n. Thomas Alva Edison http://pglearn.blogspot.mx/