>>> VACUUM FULL - immediate shutdown - problem with recovery? > > An immediate shutdown == an intentional crash. OK, so you have the > VACUUM FULL and the immediate shutdown just afterward. So we just > need to figure out what happened during recovery. >
Right. >> But WAL replay should still have handled this. I would presume even an >> immediate shutdown ensures that WAL is flushed to disk properly? > > I'm not sure, but I doubt it. If the VACUUM FULL committed, then the > WAL records should be on disk, but if the immediate shutdown happened > while it was still running, then the WAL records might still be in > wal_buffers, in which case I don't think they'll get written out and > thus zero pages in the index are to be expected. Now that doesn't > explain any other corruption in the file, but I believe all-zeroes > pages in a relation are an expected consequence of an unclean > shutdown. But assuming the VF actually committed before the immediate > shutdown, there must be something else going on, since by that point > XLOG should have been flushed. > Oh yeah, so if VF committed, the xlog should have been ok too, but can't say the same about the shared buffers. >> So that means that either there is a corner case bug in VF which adds >> incorrect WAL logging in some specific btree layout scenarios or there >> was indeed some bit flipping in the WAL, which caused the recovery to >> prematurely end during WAL replay. What are the scenarios that you >> would think can cause WAL bit flipping? > > Some kind of fluke hard drive malfunction, maybe? I know that the > incidence of a hard drive being told to write A and actually writing B > is very low, but it's probably not exactly zero. Do you have the logs > from the recovery following the immediate shutdown? Anything > interesting there? > Unfortunately we do not have the recovery logs. Would have loved to see some signs about some issues in the WAL replay to confirm the theory about bit flipping.. > Or, as you say, there could be a corner-case VF bug. > Yeah, much harder to find by just eyeballing the code I guess :) Regards, Nikhils -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers