On Sat, Sep 13, 2014 at 10:27 PM, k...@rice.edu <k...@rice.edu> wrote:
> On Sat, Sep 13, 2014 at 09:50:55PM -0300, Arthur Silva wrote: > > On Sat, Sep 13, 2014 at 1:55 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > > > > > Andres Freund <and...@2ndquadrant.com> writes: > > > > On 2014-09-13 08:52:33 +0300, Ants Aasma wrote: > > > >> On Sat, Sep 13, 2014 at 6:59 AM, Arthur Silva <arthur...@gmail.com> > > > wrote: > > > >>> That's not entirely true. CRC-32C beats pretty much everything with > > > the same > > > >>> length quality-wise and has both hardware implementations and > highly > > > >>> optimized software versions. > > > > > > >> For better or for worse CRC is biased by detecting all single bit > > > >> errors, the detection capability of larger errors is slightly > > > >> diminished. The quality of the other algorithms I mentioned is also > > > >> very good, while producing uniformly varying output. > > > > > > > There's also much more literature about the various CRCs in > comparison > > > > to some of these hash allgorithms. > > > > > > Indeed. CRCs have well-understood properties for error detection. > > > Have any of these new algorithms been analyzed even a hundredth as > > > thoroughly? No. I'm unimpressed by evidence-free claims that > > > something else is "also very good". > > > > > > Now, CRCs are designed for detecting the sorts of short burst errors > > > that are (or were, back in the day) common on phone lines. You could > > > certainly make an argument that that's not the type of threat we face > > > for PG data. However, I've not seen anyone actually make such an > > > argument, let alone demonstrate that some other algorithm would be > better. > > > To start with, you'd need to explain precisely what other error pattern > > > is more important to defend against, and why. > > > > > > regards, tom lane > > > > > > > Mysql went this way as well, changing the CRC polynomial in 5.6. > > > > What we are looking for here is uniqueness thus better error detection. > Not > > avalanche effect, nor cryptographically secure, nor bit distribution. > > As far as I'm aware CRC32C is unbeaten collision wise and time proven. > > > > I couldn't find tests with xxhash and crc32 on the same hardware so I > spent > > some time putting together a benchmark (see attachment, to run it just > > start run.sh) > > > > I included a crc32 implementation using ssr4.2 instructions (which works > on > > pretty much any Intel processor built after 2008 and AMD built after > 2012), > > a portable Slice-By-8 software implementation and xxhash since it's the > > fastest software 32bit hash I know of. > > > > Here're the results running the test program on my i5-4200M > > > > crc sb8: 90444623 > > elapsed: 0.513688s > > speed: 1.485220 GB/s > > > > crc hw: 90444623 > > elapsed: 0.048327s > > speed: 15.786877 GB/s > > > > xxhash: 7f4a8d5 > > elapsed: 0.182100s > > speed: 4.189663 GB/s > > > > The hardware version is insanely and works on the majority of Postgres > > setups and the fallback software implementations is 2.8x slower than the > > fastest 32bit hash around. > > > > Hopefully it'll be useful in the discussion. > > Thank you for running this sample benchmark. It definitely shows that the > hardware version of the CRC is very fast, unfortunately it is really only > available on x64 Intel/AMD processors which leaves all the rest lacking. > For current 64-bit hardware, it might be instructive to also try using > the XXH64 version and just take one half of the hash. It should come in > at around 8.5 GB/s, or very nearly the speed of the hardware accelerated > CRC. Also, while I understand that CRC has a very venerable history and > is well studied for transmission type errors, I have been unable to find > any research on its applicability to validating file/block writes to a > disk drive. While it is to quote you "unbeaten collision wise", xxhash, > both the 32-bit and 64-bit version are its equal. Since there seems to > be a lack of research on disk based error detection versus CRC polynomials, > it seems likely that any of the proposed hash functions are on an equal > footing in this regard. As Andres commented up-thread, xxhash comes along > for "free" with lz4. > > Regards, > Ken > For the sake of completeness the results for xxhash64 in my machine xxhash64 speed: 7.365398 GB/s Which is indeed very fast.