Hello, Thank You for the response. Yes that's possible to monitor replication delay. But my questions were not about monitoring network issues.
I use exactly wal_sender_timeout=1s because it allows to detect replication problems quickly. So, I need clarification to the following questions: Is it possible to use exactly this configuration and be sure that it will be work properly. What did I do wrong? Should I correct my configuration somehow? Is this the same issue as mentioned here: https://www.postgresql.org/message-id/[email protected] ? If it is so, why I do I face this problem again? Thank you in advance. Best regards, Andrei From: Rene Romero Benavides <[email protected]> To: [email protected], Cc: Postgres General <[email protected]> Date: 14/05/2019 20:12 Subject: Re: terminating walsender process due to replication timeout To detect network issues maybe you could monitor replication delay. On Mon, May 13, 2019 at 6:42 AM <[email protected]> wrote: Hello PostgreSQL Community! I faced an issue on my linux machine using Postgres 11.3 . I have 2 nodes in db cluster: master and standby. I tried to perform a plenty of long-running queries which lead to the databases desynchronization: terminating walsender process due to replication timeout Here is the output in debug mode: 2019-05-13 13:21:33 FET 00000 DEBUG: sending replication keepalive 2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed; blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed; blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed; blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed; blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed; blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 2019-05-13 13:21:34 FET 00000 DEBUG: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 2019-05-13 13:21:34 FET 00000 DEBUG: CommitTransaction(1) name: unnamed; blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 2019-05-13 13:21:34 FET 00000 LOG: terminating walsender process due to replication timeout The issue is reproducible. I configure 2 nodes cluster, download demo_small.zip from https://edu.postgrespro.ru/ and run the following command: psql -U user1 -f demo_small.sql db1 and I get the observed behaviour. I know that I can increase wal_sender_timeout value to avoid this behaviour (currently wal_sender_timeout is equal to 1 second.) To be honest I don't want to increase wal_sender_timeout because I would like to detect some network issues quickly. After having googled I found that someone faced a similar issue https://www.postgresql.org/message-id/[email protected] which was fixed in PostgreSQL 9.4.16. Is my issue the same as described here https://www.postgresql.org/message-id/[email protected] ? Is there any other chance to avoid it without increasing wal_sender_timeout? Thank you in advance. Regards, Andrei -- El genio es 1% inspiración y 99% transpiración. Thomas Alva Edison http://pglearn.blogspot.mx/
