Hi, On 2021-05-04 09:46:12 -0400, Tom Lane wrote: > Yeah, I have also spent a fair amount of time trying to reproduce it > elsewhere, without success so far. Notably, I've been trying on a > PPC Mac laptop that has a fairly similar CPU to what's in the G4, > though a far slower disk drive. So that seems to exclude theories > based on it being PPC-specific. > > I suppose that if we're unable to reproduce it on at least one other box, > we have to write it off as hardware flakiness.
I wonder if there's a chance what we're seeing is an OS memory ordering bug, or a race between walreceiver writing data and the startup process reading it. When the startup process is able to keep up, there often will be a very small time delta between the startup process reading a page that the walreceiver just wrote. And if the currently read page was the tail page written to by a 'w' message, it'll often be written to again in short order - potentially while the startup process is reading it. It'd not terribly surprise me if an old OS version on an old processor had some issues around that. Were there any cases of walsender terminating and reconnecting around the failures? It looks suspicious that XLogPageRead() does not invalidate the xlogreader state when retrying. Normally that's xlogreader's responsibility, but there is that whole XLogReaderValidatePageHeader() business. But I don't quite see how it'd actually cause problems. Greetings, Andres Freund