Re: terminating walsender process due to replication timeout

AYahorau Wed, 15 May 2019 01:44:43 -0700

Hello,
Thank You for the response.

Yes that's possible to monitor replication delay. But my questions were 
not about monitoring network issues.


I use exactly wal_sender_timeout=1s because it allows to detect 
replication problems quickly.
So, I need clarification to the following  questions:
Is  it possible to use exactly this configuration and be sure that it will 
be work properly.
What did I do wrong? Should I correct my configuration somehow?
Is this the same issue  as mentioned here: 
https://www.postgresql.org/message-id/[email protected]
 
? If it is so, why I do I face this problem again?

Thank you in advance.
Best regards,
Andrei




From:   Rene Romero Benavides <[email protected]>
To:     [email protected], 
Cc:     Postgres General <[email protected]>
Date:   14/05/2019 20:12
Subject:        Re: terminating walsender process due to replication 
timeout



To detect network issues maybe you could monitor replication delay.

On Mon, May 13, 2019 at 6:42 AM <[email protected]> wrote:
Hello PostgreSQL Community! 

I faced an issue on my linux machine using Postgres 11.3 . 
I have 2 nodes in db cluster: master and standby. 
I tried to perform a plenty of long-running  queries which lead to the 
databases desynchronization: 
terminating walsender process due to replication timeout 

Here is the output in debug mode: 
2019-05-13 13:21:33 FET 00000 DEBUG:  sending replication keepalive 
2019-05-13 13:21:34 FET 00000 DEBUG:  StartTransaction(1) name: unnamed; 
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 
2019-05-13 13:21:34 FET 00000 DEBUG:  CommitTransaction(1) name: unnamed; 
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 
2019-05-13 13:21:34 FET 00000 DEBUG:  StartTransaction(1) name: unnamed; 
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 
2019-05-13 13:21:34 FET 00000 DEBUG:  CommitTransaction(1) name: unnamed; 
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 
2019-05-13 13:21:34 FET 00000 DEBUG:  StartTransaction(1) name: unnamed; 
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 
2019-05-13 13:21:34 FET 00000 DEBUG:  CommitTransaction(1) name: unnamed; 
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 
2019-05-13 13:21:34 FET 00000 DEBUG:  StartTransaction(1) name: unnamed; 
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 
2019-05-13 13:21:34 FET 00000 DEBUG:  CommitTransaction(1) name: unnamed; 
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 
2019-05-13 13:21:34 FET 00000 DEBUG:  StartTransaction(1) name: unnamed; 
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 
2019-05-13 13:21:34 FET 00000 DEBUG:  CommitTransaction(1) name: unnamed; 
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 
2019-05-13 13:21:34 FET 00000 DEBUG:  StartTransaction(1) name: unnamed; 
blockState: DEFAULT; state: INPROGRESS, xid/subid/cid: 0/1/0 
2019-05-13 13:21:34 FET 00000 DEBUG:  CommitTransaction(1) name: unnamed; 
blockState: END; state: INPROGRESS, xid/subid/cid: 0/1/0 
2019-05-13 13:21:34 FET 00000 LOG:  terminating walsender process due to 
replication timeout 


The issue is reproducible. I configure 2 nodes cluster, download 
demo_small.zip from https://edu.postgrespro.ru/ and run the following 
command: 
psql -U user1 -f demo_small.sql db1 
and I get the observed behaviour. 


I know that I can increase wal_sender_timeout value to avoid this 
behaviour (currently wal_sender_timeout is equal to 1 second.) 
To be honest I don't want to increase wal_sender_timeout because I would 
like to detect some network issues quickly. 

After having googled I found that someone faced a similar issue 
https://www.postgresql.org/message-id/[email protected]
 
which was fixed in  PostgreSQL 9.4.16. 


Is my issue the same as described here 
https://www.postgresql.org/message-id/[email protected]
 
? 
Is there any  other chance to avoid it without increasing 
wal_sender_timeout? 


Thank you in advance. 
Regards, 
Andrei


-- 
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/

Re: terminating walsender process due to replication timeout

Reply via email to