On 2013-12-29 02:48:21 -0500, Tom Lane wrote: > 4. The server tries to start, and fails because it can't find a WAL file > containing the last checkpoint record. This is pretty unsurprising given > the facts above. The reason you don't see any "no such file" report is > that XLogFileRead() will report any BasicOpenFile() failure *other than* > ENOENT. And nothing else makes up for that. > > Re point 4: the logic, if you can call it that, in xlog.c and xlogreader.c > is making my head spin. There are about four levels of overcomplicated > and undercommented code before you ever get down to XLogFileRead, so I > have no idea which level to blame for the lack of error reporting in this > specific case. But there are pretty clearly some cases in which ignoring > ENOENT in XLogFileRead isn't such a good idea, and XLogFileRead isn't > being told when to do that or not.
Yes, that code is pretty horrid. To Heikki's and my defense, I don't think the xlogreader.c split had much to do with it tho. I think the path erroring out essentially is ReadRecord()->XLogReadRecord()*->ReadPageInternal()*->XLogPageRead() ->WaitForWALToBecomeAvailable()->XLogFileReadAnyTLI()->XLogFileRead() The *ed functions are new, but it's really code that was in ReadRecord() before. So I don't think too much has changed since 9.0ish, although the timeline switch didn't make it simpler. As far as I can tell XLogFileRead() actually is told when it's ok to ignore an error - the notfoundOK parameter. It's just that we're always passing true for it we're not streaming... I think it might be sufficient to make passing that flag additionally conditional on fetching_ckpt, that's already passed to WaitForWALToBecomeAvailable(), so we'd just need to add it to XLogFileReadAnyTLI(). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers