At Wed, 26 Jan 2022 18:45:53 -0500, Andrew Dunstan <and...@dunslane.net> wrote 
in 
> It's very unlikely any of this is your fault. In any case, intermittent
> failures are very hard to nail down.

The primary starts at 2022-01-26 16:30:06.613 the accepted a connectin
from the standby at 2022-01-26 16:30:09.911.

P: 2022-01-26 16:30:06.613 UTC [74668:1] LOG:  starting PostgreSQL 15devel on 
x86_64-w64-mingw32, compiled by gcc.exe (Rev2, Built by MSYS2 project) 10.3.0, 
64-bit
S: 2022-01-26 16:30:09.637 UTC [72728:1] LOG:  starting PostgreSQL 15devel on 
x86_64-w64-mingw32, compiled by gcc.exe (Rev2, Built by MSYS2 project) 10.3.0, 
64-bit
P: 2022-01-26 16:30:09.911 UTC [74096:3] [unknown] LOG:  replication connection 
authorized: user=pgrunner application_name=standby
S: 2022-01-26 16:30:09.912 UTC [73932:1] LOG:  started streaming WAL from 
primary at 0/3000000 on timeline 1

After this the primary restarts.

P: 2022-01-26 16:30:11.832 UTC [74668:7] LOG:  database system is shut down
P: 2022-01-26 16:30:12.010 UTC [72140:1] LOG:  starting PostgreSQL 15devel on 
x86_64-w64-mingw32, compiled by gcc.exe (Rev2, Built by MSYS2 project) 10.3.0, 
64-bit

But the standby doesn't notice the disconnection and continue with the
poll_query_until() on the stale connection. But the query should have
executed after reconnection finally.  The following log lines are not
thinned out.

S: 2022-01-26 16:30:09.912 UTC [73932:1] LOG:  started streaming WAL from 
primary at 0/3000000 on timeline 1
S: 2022-01-26 16:30:12.825 UTC [72760:1] [unknown] LOG:  connection received: 
host=127.0.0.1 port=60769
S: 2022-01-26 16:30:12.830 UTC [72760:2] [unknown] LOG:  connection 
authenticated: identity="EC2AMAZ-P7KGG90\\pgrunner" method=sspi 
(C:/tools/msys64/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/modules/commit_ts/tmp_check/t_003_standby_2_standby_data/pgdata/pg_hba.conf:2)
S: 2022-01-26 16:30:12.830 UTC [72760:3] [unknown] LOG:  connection authorized: 
user=pgrunner database=postgres application_name=003_standby_2.pl
S: 2022-01-26 16:30:12.838 UTC [72760:4] 003_standby_2.pl LOG:  statement: 
SELECT '0/303C7D0'::pg_lsn <= pg_last_wal_replay_lsn()

Since the standby dones't receive WAL from the restarted server so the
poll_query_until() doesn't return.

I'm not sure why walsender of the standby continued running not knowing the 
primary has been once dead for such a long time.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Reply via email to