On Fri, Apr 5, 2013 at 7:23 PM, Jeff Davis <pg...@j-davis.com> wrote: > On Tue, 2013-03-26 at 03:34 +0200, Ants Aasma wrote: >> The main thing to look out for is that we don't >> have any blind spots for conceivable systemic errors. If we decide to >> go with the SIMD variant then I intend to figure out what the blind >> spots are and show that they don't matter. > > Are you still looking into SIMD? Right now, it's using the existing CRC > implementation. Obviously we can't change it after it ships. Or is it > too late to change it already?
Yes, I just managed to get myself some time so I can look at it some more. I was hoping that someone would weigh in on what their preferences are on the performance/effectiveness trade-off and the fact that we need to use assembler to make it fly so I knew how to go forward. The worst blind spot that I could come up with was an even number of single bit errors that are all on the least significant bit of 16bit word. This type of error can occur in memory chips when row lines go bad, usually stuck at zero or one. The SIMD checksum would have 50% chance of detecting such errors (assuming reasonably uniform distribution of 1 and 0 bits in the low order). On the other hand, anyone caring about data integrity should be running ECC protected memory anyway, making this particular error unlikely in practice. Otherwise the algorithm seems reasonably good, it detects transpositions, zeroing out ranges and other such common errors. It's especially good on localized errors, detecting all single bit errors. I did a quick test harness to empirically test the effectiveness of the hash function. As test data I loaded an imdb dataset dump into master and then concatenated everything in the database datadir except pg_* together. That makes for a total of 2.8GB data. The test cases I tried so far were randomized bit flips 1..4 per page, write 0x00 or 0xFF byte into each location on the page (1 byte error), zero out the ending of the page starting from a random location and write a segment of random garbage into the page. The partial write and bit flip tests were repeated 1000 times per page. The results so far are here: Test Detects Miss rate ---------------------------------------- Single bit flip 100.000000% 1:inf Double bit flip 99.230267% 1:130 Triple bit flip 99.853346% 1:682 Quad bit flip 99.942418% 1:1737 Write 0x00 byte 99.999999% 1:148602862 Write 0xFF byte 99.999998% 1:50451919 Partial write 99.922942% 1:12988 Write garbage 99.998435% 1:63885 Unless somebody tells me not to waste my time I'll go ahead and come up with a workable patch by Monday. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers