On Mon, 2013-04-01 at 10:37 -0700, Jeff Janes wrote: > Over 10,000 cycles of crash and recovery, I encountered two cases of > checksum failures after recovery, example: > > > 14264 SELECT 2013-03-28 13:08:38.980 PDT:WARNING: page verification > failed, calculated checksum 7017 but expected 1098 > 14264 SELECT 2013-03-28 13:08:38.980 PDT:ERROR: invalid page in block > 77 of relation base/16384/2088965 > > 14264 SELECT 2013-03-28 13:08:38.980 PDT:STATEMENT: select sum(count) > from foo
It would be nice to know whether that's an index or a heap page. > > In both cases, the bad block (77 in this case) is the same block that > was intentionally partially-written during the "crash". However, that > block should have been restored from the WAL FPW, so its fragmented > nature should not have been present in order to be detected. Any idea > what is going on? Not right now. My primary suspect is what's going on in visibilitymap_set() and heap_xlog_visible(), which is more complex than some of the other code. That would require some VACUUM activity, which isn't in your workload -- do you think autovacuum may kick in sometimes? Thank you for testing! I will try to reproduce it, as well. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers