At Fri, 23 Feb 2024 04:04:03 +0000, Mark Schloss <mark.schl...@austrac.gov.au> 
wrote in 
> <2024-02-23 07:50:05.637 AEDT [1957121]: [1-1] user=,db= > LOG:  started 
> streaming WAL from primary at 6/B0000000 on timeline 5
> <2024-02-23 07:50:05.696 AEDT [1957117]: [6-1] user=,db= > LOG:  invalid 
> magic number 0000 in log segment 0000000500000006000000B0, offset 0

This appears to suggest that the WAL file that the standby fetched was
zero-filled on the primary side, which cannot happen by a normal
operation. A preallocated WAL segment can be zero-filled but it cannot
be replicated under normal operations.

> <2024-02-22 14:20:23.383 AEDT [565231]: [6-1] user=,db= > FATAL:  terminating 
> walreceiver process due to administrator command

This may suggest a config reload with some parameter changes.

One possible scenario matching the log lines could be that someone
switched primary_conninfo to a newly-restored primary. However, if the
new primary had older data than the previously connected primary,
possibly leading to the situation where the segment 0..5..6..B0 on it
was a preallocated one that was filled with zeros, the standby could
end up fetching the zero-filled WAL segment (file) and might fail this
way. If this is the case, such operations should be avoided.

Unfortunately, due to the lack of a reproducer or detailed operations
that took place there, the best I can do now is to guess a possible
scenario as described above. I'm not sure how come the situation
actually arose.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Reply via email to