On Wed, Oct 26, 2011 at 9:57 AM, Florian Pflug <f...@phlo.org> wrote: > On Oct26, 2011, at 15:12 , Simon Riggs wrote: >> On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk <ai...@highrise.ca> wrote: >> >>> The read fails because their is no data at the location it's trying to >>> read from, because clog hasn't been extended yet by recovery. >> >> You don't actually know that, though I agree it seems a reasonable >> guess and was my first thought also. > > The actual error message also supports that theory. Here's the relevant > snippet from the OP's log (Found in > ca9fd2fe.1d8d2%linas.virba...@continuent.com) > > 2011-09-21 13:41:05 CEST FATAL: could not access status of transaction > 1188673 > 2011-09-21 13:41:05 CEST DETAIL: Could not read from file "pg_clog/0001" at > offset 32768: Success. > > Note that it says "Success" at the end of the second log entry. That > can only happen, I think, if we're trying to read the page adjacent to > the last page in the file. The seek would be successfull, and the subsequent > read() would indicate EOF by returning zero bytes. None of the calls would > set errno. If there was a real IO error, read() would set errno, and if the > page wasn't adjacent to the last page in the file, seek() would set errno. > In both cases we'd see the corresponding error messag, not "Success".
And even more pointedly, in the original go around on this: http://article.gmane.org/gmane.comp.db.postgresql.devel.general/174056 He reported that clog/0000 after pg_start_backup call: -rw------- 1 postgres postgres 8192 Sep 23 14:31 0000 Changed during the rsync phase to this: -rw------- 1 postgres postgres 16384 Sep 23 14:33 0000 But on the slave, of course, it was copied before it was extend so it was the original size (that's ok, that's the point of recovery after the backup): -rw------- 1 postgres postgres 8192 Sep 23 14:31 0000 With the error: 2011-09-23 14:33:46 CEST FATAL: could not access status of transaction 37206 2011-09-23 14:33:46 CEST DETAIL: Could not read from file "pg_clog/0000" at offset 8192: Success. And that error happens *before* recovery even can get attempted. And that if he copied the "recent" clog/0000 from the master, it did start up. And I think they also reported that if they didn't run hot standby, but just normal recovery into a new master, it didn't have the problem either, i.e. without hotstandby, recovery ran, properly extended the clog, and then ran as a new master fine. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers