At Wed, 28 Sep 2022 08:50:12 +0000, "Lahnov, Igor" <igor.lah...@nexign.com> 
wrote in 
> Hi,
> After failover all stand by nodes could not start streaming wal recovery.
> Streaming recovery start from 1473/A5000000, but standby start at 
> 1473/A5FFEE08, this seems to be the problem.

It's not a problem at all. It is quite normal for standby to start
streaming from the beginning of a WAL segment.

> What can we do in this case to restore?
> Is it possible to shift wal streaming recovery point on primary?
> Can checkpoint on primary help in this situation?


> 2022-09-26 14:08:23.672  [3747868]     LOG:  started streaming WAL from 
> primary at 1473/A5000000 on timeline 18
> 2022-09-26 14:08:24.363  [3747796]     LOG:  invalid record length at 
> 1473/A5FFEE08: wanted 24, got 0
> 2022-09-26 14:08:24.366  [3747868]     FATAL:  terminating walreceiver 
> process due to administrator command

This seems to mean someone emtpied primary_conninfo.

> 2022-09-26 14:08:24.366  [3747796]     LOG:  invalid record length at 
> 1473/A5FFEE08: wanted 24, got 0
> 2022-09-26 14:08:24.366  [3747796]     LOG:  invalid record length at 
> 1473/A5FFEE08: wanted 24, got 0

I don't fully understand the situation. A situation that leads the
this state I can come up with is that somehow the standby restored an
incomplete WAL segment from the primary. For example, in a case
wheresomeone copied the current active WAL file from pg_wal to archive
on the primary, or a case where restore_command on the standby fetches
WAL files from pg_wal on the primary instead of its archive. Both are
not normal operations.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Reply via email to