terminating walsender process due to replication timeout

AYahorau Mon, 13 May 2019 04:43:24 -0700

Hello PostgreSQL Community!

I faced an issue on my linux machine using Postgres 11.3 .
I have 2 nodes in db cluster: master and standby.
I tried to perform a plenty of long-running  queries which lead to the 
databases desynchronization:
terminating walsender process due to replication timeout


Here is the output in debug mode:
2019-05-13 13:21:33 FET 00000 DEBUG:  sending replication keepalive
2019-05-13 13:21:34 FET 00000 DEBUG:  StartTransaction(1) name: unnamed; 
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG:  CommitTransaction(1) name: unnamed; 
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG:  StartTransaction(1) name: unnamed; 
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG:  CommitTransaction(1) name: unnamed; 
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG:  StartTransaction(1) name: unnamed; 
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG:  CommitTransaction(1) name: unnamed; 
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG:  StartTransaction(1) name: unnamed; 
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG:  CommitTransaction(1) name: unnamed; 
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG:  StartTransaction(1) name: unnamed; 
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG:  CommitTransaction(1) name: unnamed; 
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG:  StartTransaction(1) name: unnamed; 
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 DEBUG:  CommitTransaction(1) name: unnamed; 
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0
2019-05-13 13:21:34 FET 00000 LOG:  terminating walsender process due to 
replication timeout


The issue is reproducible. I configure 2 nodes cluster, download 
demo_small.zip from https://edu.postgrespro.ru/ and run the following 
command:
psql -U user1 -f demo_small.sql db1
and I get the observed behaviour.


I know that I can increase wal_sender_timeout value to avoid this 
behaviour (currently wal_sender_timeout is equal to 1 second.)
To be honest I don't want to increase wal_sender_timeout because I would 
like to detect some network issues quickly.

After having googled I found that someone faced a similar issue 
https://www.postgresql.org/message-id/e082a56a-fd95-a250-3bae-0fff93832...@2ndquadrant.com
 
which was fixed in  PostgreSQL 9.4.16.


Is my issue the same as described here 
https://www.postgresql.org/message-id/e082a56a-fd95-a250-3bae-0fff93832...@2ndquadrant.com
 
?
Is there any  other chance to avoid it without increasing 
wal_sender_timeout?


Thank you in advance.
Regards, 
Andrei

terminating walsender process due to replication timeout

Reply via email to