On Tue, Mar 26, 2013 at 4:23 PM, Jeff Davis <pg...@j-davis.com> wrote:
> On Tue, 2013-03-26 at 02:50 +0900, Fujii Masao wrote: > > Hi, > > > > I found that the regression test failed when I created the database > > cluster with the checksum and set wal_level to archive. I think that > > there are some bugs around checksum feature. Attached is the > regression.diff. > > Thank you for the report. This was a significant oversight, but simple > to diagnose and fix. > > There were several places that were doing something like: > > PageSetChecksumInplace > if (use_wal) > log_newpage > smgrextend > > Which is obviously wrong, because log_newpage set the LSN of the page, > invalidating the checksum. We need to set the checksum after > log_newpage. > > Also, I noticed that copy_relation_data was doing smgrread without > validating the checksum (or page header, for that matter), so I also > fixed that. > > Patch attached. Only brief testing done, so I might have missed > something. I will look more closely later. > After applying your patch, I could run the stress test described here: http://archives.postgresql.org/pgsql-hackers/2012-02/msg01227.php But altered to make use of initdb -k, of course. Over 10,000 cycles of crash and recovery, I encountered two cases of checksum failures after recovery, example: 14264 SELECT 2013-03-28 13:08:38.980 PDT:WARNING: page verification failed, calculated checksum 7017 but expected 1098 14264 SELECT 2013-03-28 13:08:38.980 PDT:ERROR: invalid page in block 77 of relation base/16384/2088965 14264 SELECT 2013-03-28 13:08:38.980 PDT:STATEMENT: select sum(count) from foo In both cases, the bad block (77 in this case) is the same block that was intentionally partially-written during the "crash". However, that block should have been restored from the WAL FPW, so its fragmented nature should not have been present in order to be detected. Any idea what is going on? Unfortunately I already cleaned up the data directory before noticing the problem, so I have nothing to post for forensic analysis. I'll try to reproduce the problem. Without the initdb -k option, I ran it for 30,000 cycles and found no problems. I don't think this is because the problem exists but is going undetected, because my test is designed to detect such problems--if the block is fragmented but not overwritten by WAL FPW, that should occasionally lead to detectable inconsistent tuples. I don't think your patch caused this particular problem, but it merely fixed a problem that was previously preventing me from running my test. Cheers, Jeff