Hi guys, I'm contemplating implementing a new fast hash algorithm in Illumos' ZFS implementation to supplant the currently utilized sha256. On modern 64-bit CPUs SHA-256 is actually much slower than SHA-512 and indeed much slower than many of the SHA-3 candidates, so I went out and did some testing (details attached) on a possible new hash algorithm that might improve on this situation.
However, before I start out on a pointless endeavor, I wanted to probe the field of ZFS users, especially those using dedup, on whether their workloads would benefit from a faster hash algorithm (and hence, lower CPU utilization). Developments of late have suggested to me three possible candidates: * SHA-512: simplest to implement (since the code is already in the kernel) and provides a modest performance boost of around 60%. * Skein-512: overall fastest of the SHA-3 finalists and much faster than SHA-512 (around 120-150% faster than the current sha256). * Edon-R-512: probably the fastest general purpose hash algorithm I've ever seen (upward of 300% speedup over sha256) , but might have potential security problems (though I don't think this is of any relevance to ZFS, as it doesn't use the hash for any kind of security purposes, but only for data integrity & dedup). My testing procedure: nothing sophisticated, I took the implementation of sha256 from the Illumos kernel and simply ran it on a dedicated psrset (where possible with a whole CPU dedicated, even if only to a single thread) - I tested both the generic C implementation and the Intel assembly implementation. The Skein and Edon-R implementations are in C optimized for 64-bit architectures from the respective authors (the most up to date versions I could find). All code has been compiled using GCC 3.4.3 from the repos (the same that can be used for building Illumos). Sadly, I don't have access to Sun Studio. Cheers, -- Saso
Hash preformances on 10 GB of data gcc (GCC) 3.4.3 (csl-sol210-3_4-20050802) CFLAGS: -O3 -fomit-frame-pointer -m64 MACHINE #1 CPU: dual AMD Opteron 4234 Options: single thread on no-intr whole-CPU psrset Algorithm Result Improvement sha256 (ASM) 21.19 cycles/byte (baseline) sha256 (C) 27.66 cycles/byte -23.34% sha512 (ASM) 13.48 cycles/byte 57.20% sha512 (C) 17.35 cycles/byte 22.13% Skein-512 (C) 8.95 cycles/byte 136.76% Edon-R-512 (C) 4.94 cycles/byte 328.94% MACHINE #2 CPU: single AMD Athlon II Neo N36L Options: single thread on no-intr 1-core psrset Algorithm Result Improvement sha256 (ASM) 15.68 cycles/byte (baseline) sha256 (C) 18.81 cycles/byte -16.64% sha512 (ASM) 9.95 cycles/byte 57.59% sha512 (C) 11.84 cycles/byte 32.43% Skein-512 (C) 6.25 cycles/byte 150.88% Edon-R-512 (C) 3.66 cycles/byte 328.42% MACHINE #3 CPU: dual Intel Xeon E5645 Options: single thread on no-intr whole-CPU psrset Algorithm Result Improvement sha256 (ASM) 15.49 cycles/byte (baseline) sha256 (C) 17.90 cycles/byte -13.46% sha512 (ASM) 9.88 cycles/byte 56.78% sha512 (C) 11.44 cycles/byte 35.40% Skein-512 (C) 6.88 cycles/byte 125.15% Edon-R-512 (C) 3.35 cycles/byte 362.39% MACHINE #4 CPU: single Intel Xeon E5405 Options: single thread on no-intr 1-core psrset Algorithm Result Improvement sha256 (ASM) 17.45 cycles/byte (baseline) sha256 (C) 18.34 cycles/byte -4.85% sha512 (ASM) 10.24 cycles/byte 70.41% sha512 (C) 11.72 cycles/byte 48.90% Skein-512 (C) 7.32 cycles/byte 138.39% Edon-R-512 (C) 3.86 cycles/byte 352.07% MACHINE #5 CPU: dual Intel Xeon E5450 Options: single thread on no-intr whole-CPU psrset Algorithm Result Improvement sha256 (ASM) 16.43 cycles/byte (baseline) sha256 (C) 18.50 cycles/byte -11.19% sha512 (ASM) 10.37 cycles/byte 58.44% sha512 (C) 11.85 cycles/byte 38.65% Skein-512 (C) 7.38 cycles/byte 122.63% Edon-R-512 (C) 3.88 cycles/byte 323.45%
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss