Re: [HACKERS] Enabling Checksums

Ants Aasma Fri, 05 Apr 2013 11:39:49 -0700

On Fri, Apr 5, 2013 at 7:23 PM, Jeff Davis <pg...@j-davis.com> wrote:
> On Tue, 2013-03-26 at 03:34 +0200, Ants Aasma wrote:
>> The main thing to look out for is that we don't
>> have any blind spots for conceivable systemic errors. If we decide to
>> go with the SIMD variant then I intend to figure out what the blind
>> spots are and show that they don't matter.
>
> Are you still looking into SIMD? Right now, it's using the existing CRC
> implementation. Obviously we can't change it after it ships. Or is it
> too late to change it already?


Yes, I just managed to get myself some time so I can look at it some
more. I was hoping that someone would weigh in on what their
preferences are on the performance/effectiveness trade-off and the
fact that we need to use assembler to make it fly so I knew how to go
forward.

The worst blind spot that I could come up with was an even number of
single bit errors that are all on the least significant bit of 16bit
word. This type of error can occur in memory chips when row lines go
bad, usually stuck at zero or one. The SIMD checksum would have 50%
chance of detecting such errors (assuming reasonably uniform
distribution of 1 and 0 bits in the low order). On the other hand,
anyone caring about data integrity should be running ECC protected
memory anyway, making this particular error unlikely in practice.

Otherwise the algorithm seems reasonably good, it detects
transpositions, zeroing out ranges and other such common errors. It's
especially good on localized errors, detecting all single bit errors.

I did a quick test harness to empirically test the effectiveness of
the hash function. As test data I loaded an imdb dataset dump into
master and then concatenated everything in the database datadir except
pg_* together. That makes for a total of 2.8GB data. The test cases I
tried so far were randomized bit flips 1..4 per page, write 0x00 or
0xFF byte into each location on the page (1 byte error), zero out the
ending of the page starting from a random location and write a segment
of random garbage into the page. The partial write and bit flip tests
were repeated 1000 times per page. The results so far are here:

           Test     Detects   Miss rate
----------------------------------------
Single bit flip 100.000000% 1:inf
Double bit flip  99.230267% 1:130
Triple bit flip  99.853346% 1:682
  Quad bit flip  99.942418% 1:1737
Write 0x00 byte  99.999999% 1:148602862
Write 0xFF byte  99.999998% 1:50451919
  Partial write  99.922942% 1:12988
  Write garbage  99.998435% 1:63885

Unless somebody tells me not to waste my time I'll go ahead and come
up with a workable patch by Monday.

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

Reply via email to