Hi guys,

I'm contemplating implementing a new fast hash algorithm in Illumos' ZFS
implementation to supplant the currently utilized sha256. On modern
64-bit CPUs SHA-256 is actually much slower than SHA-512 and indeed much
slower than many of the SHA-3 candidates, so I went out and did some
testing (details attached) on a possible new hash algorithm that might
improve on this situation.

However, before I start out on a pointless endeavor, I wanted to probe
the field of ZFS users, especially those using dedup, on whether their
workloads would benefit from a faster hash algorithm (and hence, lower
CPU utilization). Developments of late have suggested to me three
possible candidates:

 * SHA-512: simplest to implement (since the code is already in the
   kernel) and provides a modest performance boost of around 60%.

 * Skein-512: overall fastest of the SHA-3 finalists and much faster
   than SHA-512 (around 120-150% faster than the current sha256).

 * Edon-R-512: probably the fastest general purpose hash algorithm I've
   ever seen (upward of 300% speedup over sha256) , but might have
   potential security problems (though I don't think this is of any
   relevance to ZFS, as it doesn't use the hash for any kind of security
   purposes, but only for data integrity & dedup).

My testing procedure: nothing sophisticated, I took the implementation
of sha256 from the Illumos kernel and simply ran it on a dedicated
psrset (where possible with a whole CPU dedicated, even if only to a
single thread) - I tested both the generic C implementation and the
Intel assembly implementation. The Skein and Edon-R implementations are
in C optimized for 64-bit architectures from the respective authors (the
most up to date versions I could find). All code has been compiled using
GCC 3.4.3 from the repos (the same that can be used for building
Illumos). Sadly, I don't have access to Sun Studio.

Cheers,
--
Saso
Hash preformances on 10 GB of data
gcc (GCC) 3.4.3 (csl-sol210-3_4-20050802)
CFLAGS: -O3 -fomit-frame-pointer -m64

MACHINE #1
        CPU: dual AMD Opteron 4234
        Options: single thread on no-intr whole-CPU psrset

        Algorithm       Result                  Improvement
        sha256 (ASM)    21.19 cycles/byte       (baseline)
        sha256 (C)      27.66 cycles/byte       -23.34%

        sha512 (ASM)    13.48 cycles/byte       57.20%
        sha512 (C)      17.35 cycles/byte       22.13%

        Skein-512 (C)   8.95 cycles/byte        136.76%
        Edon-R-512 (C)  4.94 cycles/byte        328.94%

MACHINE #2
        CPU: single AMD Athlon II Neo N36L
        Options: single thread on no-intr 1-core psrset

        Algorithm       Result                  Improvement
        sha256 (ASM)    15.68 cycles/byte       (baseline)
        sha256 (C)      18.81 cycles/byte       -16.64%

        sha512 (ASM)    9.95 cycles/byte        57.59%
        sha512 (C)      11.84 cycles/byte       32.43%

        Skein-512 (C)   6.25 cycles/byte        150.88%
        Edon-R-512 (C)  3.66 cycles/byte        328.42%

MACHINE #3
        CPU: dual Intel Xeon E5645
        Options: single thread on no-intr whole-CPU psrset

        Algorithm       Result                  Improvement
        sha256 (ASM)    15.49 cycles/byte       (baseline)
        sha256 (C)      17.90 cycles/byte       -13.46%

        sha512 (ASM)    9.88 cycles/byte        56.78%
        sha512 (C)      11.44 cycles/byte       35.40%

        Skein-512 (C)   6.88 cycles/byte        125.15%
        Edon-R-512 (C)  3.35 cycles/byte        362.39%

MACHINE #4
        CPU: single Intel Xeon E5405
        Options: single thread on no-intr 1-core psrset

        Algorithm       Result                  Improvement
        sha256 (ASM)    17.45 cycles/byte       (baseline)
        sha256 (C)      18.34 cycles/byte       -4.85%

        sha512 (ASM)    10.24 cycles/byte       70.41%
        sha512 (C)      11.72 cycles/byte       48.90%

        Skein-512 (C)   7.32 cycles/byte        138.39%
        Edon-R-512 (C)  3.86 cycles/byte        352.07%

MACHINE #5
        CPU: dual Intel Xeon E5450
        Options: single thread on no-intr whole-CPU psrset

        Algorithm       Result                  Improvement
        sha256 (ASM)    16.43 cycles/byte       (baseline)
        sha256 (C)      18.50 cycles/byte       -11.19%

        sha512 (ASM)    10.37 cycles/byte       58.44%
        sha512 (C)      11.85 cycles/byte       38.65%

        Skein-512 (C)   7.38 cycles/byte        122.63%
        Edon-R-512 (C)  3.88 cycles/byte        323.45%

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to