On Wed, Apr 25, 2018 at 7:34 PM, Assaf Gordon <assafgor...@gmail.com> wrote: > Hello Matteo, > > On Wed, Apr 25, 2018 at 01:26:08PM +0200, Matteo Croce wrote: >> Linux supports accessing kernel crypto API via AF_ALG since >> version 2.6.38. Coreutils uses libcrypto when available and fallbacks to >> generic C implementation of various hashing functions. > > Not exactly: coreutils (and every project which uses gnulib's md5/sha* > modules) > defaults to using the generic C implementation. > gnulib automatically adds the option "--with-openssl" to "./configure", > which can be set explicitly to 'yes' or 'auto'. > > To be conservative, I would suggest following the same method: > Add a "configure" flag instead of enabling a completely new interface > by default for every downstream project, > especially that it is tied to external components (the kernel / modules / > etc). > > If a distribution wants to enable by default, they can do it > for their package (and ensure it works well with their kernel). >
Hi Assaf, I will have a look at the autotools to add such option to configure and check for something like HAVE_LINUX_CRYPTO >> Use afalg_stream() only in sha1sum for now, but other hashes are possible. >> The speed gain really depends on the CPU type, on systems which doesn't use >> libcrypto ranges from ~10% to 320%. > > It would be beneficial to know the improvements against coreutils+libcrypto > on the same CPUs, because if users build with "--openssl=yes" (which is > currently > considered the "fast" implementation), they would not benefit from your patch. > So they'll need to decide which one is preferable (and OpenSSL/libreSSL is > faster > on more CPUs and OSs, not just linux on a subset of CPUs). > I did comparations against libcrypto too, the results depends on the CPU and file size but generally libcrypto and af_alg are paragonable. Of course they are way faster than the generic C implementation. on x86_64 and file smaller than 2 GB af_alg is slightly faster than libcrypto, probably because of the sendfile usage: $ grep -m1 '^model name' /proc/cpuinfo model name : Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz $ uname -a Linux turbo 4.16.2 #71 SMP Sat Apr 21 20:14:08 CEST 2018 x86_64 x86_64 x86_64 GNU/Linux $ time ./sha1sum-plain 2g.bin 752ef2367f479e79e4f0cded9c270c2890506ab0 2g.bin real 0m2,718s user 0m2,548s sys 0m0,170s $ time ./sha1sum-libcrypto 2g.bin 752ef2367f479e79e4f0cded9c270c2890506ab0 2g.bin real 0m1,707s user 0m1,538s sys 0m0,170s $ time ./sha1sum-afalg 2g.bin 752ef2367f479e79e4f0cded9c270c2890506ab0 2g.bin real 0m1,680s user 0m0,000s sys 0m1,668s on x86_64 and huge files, af_alg is slightly slower than libcrypto, probably because two syscalls are needed per block: $ time ./sha1sum-plain 16g.bin 6b9df4760e7308efe6d9def1293b1653ff24ace1 16g.bin real 0m22,410s user 0m20,159s sys 0m2,250s $ time ./sha1sum-libcrypto 16g.bin 6b9df4760e7308efe6d9def1293b1653ff24ace1 16g.bin real 0m13,873s user 0m12,263s sys 0m1,610s $ time ./sha1sum-afalg 16g.bin 6b9df4760e7308efe6d9def1293b1653ff24ace1 16g.bin real 0m14,124s user 0m0,080s sys 0m14,044s I have similar results on an arm64 board running a quad core Cortex A72, small file: $ cat /proc/cpuinfo processor : 0 BogoMIPS : 50.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x0 CPU part : 0xd08 CPU revision : 1 $ uname -a Linux macchiatobin 4.16.0 #8 SMP Sat Apr 21 17:55:12 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux $ time ./sha1sum-plain 2g.bin 752ef2367f479e79e4f0cded9c270c2890506ab0 2g.bin real 0m9.348s user 0m8.176s sys 0m1.171s $ time ./sha1sum-libcrypto 2g.bin 752ef2367f479e79e4f0cded9c270c2890506ab0 2g.bin real 0m2.232s user 0m1.721s sys 0m0.510s $ time ./sha1sum-afalg 2g.bin 752ef2367f479e79e4f0cded9c270c2890506ab0 2g.bin real 0m2.219s user 0m0.000s sys 0m2.219s and a bigger one: $ time ./sha1sum-plain 16g.bin 6b9df4760e7308efe6d9def1293b1653ff24ace1 16g.bin real 1m14.968s user 1m5.860s sys 0m9.106s $ time ./sha1sum-libcrypto 16g.bin 6b9df4760e7308efe6d9def1293b1653ff24ace1 16g.bin real 0m20.706s user 0m14.674s sys 0m6.001s $ time ./sha1sum-afalg 16g.bin 6b9df4760e7308efe6d9def1293b1653ff24ace1 16g.bin real 0m21.608s user 0m0.142s sys 0m21.462s Unfortunately I don't have a board with a crypto HW engine available, but given the speed of such chips, the results will vary a lot in favour of af_alg. Also, keep in mind that some distribution don't want or can't use libssl for some reasons (licensing, firmware footprint, etc.) >> This is a test on a Intel(R) Xeon(R) CPU E3-1265L V2 and Debian stretch: >> >> $ truncate -s 2GB 2g.bin >> $ time sha1sum 2g.bin >> 752ef2367f479e79e4f0cded9c270c2890506ab0 2g.bin >> >> real 0m4.829s >> user 0m4.437s >> sys 0m0.391s >> $ time ./sha1sum-afalg 2g.bin >> 752ef2367f479e79e4f0cded9c270c2890506ab0 2g.bin >> >> real 0m3.164s >> user 0m0.000s >> sys 0m3.162s > > Regarding the above measurement: > On most modern filesystems "truncate -s" would create a sparse file, > requiring very little I/O. > And if the measurements above are indeed mostly CPU-bound > and not IO-bound, it indicates that hasing in the kernel > is only somewhat faster then in user-mode, but not significantly so > if you exclude I/O (not as dramatic difference as measured in > your previous email). > That was intended behaviour. Reading from a slow disk would have reported similar times between versions. > This also brings the question: in the previous emails you report > the improved *-afalg versions after the default versions - did > you clear the cache to exclude I/O effects? > (e.g. "sync; echo 3 > /proc/sys/vm/drop_caches") > as before, that would clear the block cache and the I/O could be a bottleneck, probably not what we want. >> + { >> + /* sendfile() not possible, do a classic read-write loop */ >> + while ((size = fread (buf, 1, sizeof (buf), stream))) >> + { >> + if (send (ofd, buf, size, size == sizeof (buf) ? MSG_MORE : 0) == >> -1) >> + { > > Few comments: > 1. In the above loop, wouldn't a file that's an exact multiple of BLOCKSIZE > never be sent 0 as the 4th paramater? > And if so, assuming hashing still works because "MSG_MORE" is just a hint, > why not always send MSG_MORE ? > I tried it before posting the patch, it seems that doing the last write with MSG_MORE still generates a valid hash. As you guessed, it's just an hint: $ dd status=none if=/dev/zero bs=32k count=4 |sha1sum 67dfd19f3eb3649d6f3f6631e44d0bd36b8d8d19 - $ dd status=none if=/dev/zero bs=32k count=4 |strace -e trace=%network ./sha1sum-afalg socket(AF_ALG, SOCK_SEQPACKET, 0) = 3 bind(3, {sa_family=AF_ALG, sa_data="hash\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0sha1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 88) = 0 accept(3, NULL, NULL) = 4 sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768, MSG_MORE, NULL, 0) = 32768 sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768, MSG_MORE, NULL, 0) = 32768 sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768, MSG_MORE, NULL, 0) = 32768 sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768, MSG_MORE, NULL, 0) = 32768 67dfd19f3eb3649d6f3f6631e44d0bd36b8d8d19 - +++ exited with 0 +++ Yes, we can always pass MSG_MORE to the kernel, the hash will correctly generated when calling read() > 2. The code for sha*_stream() functions is a bit more elaborate, > and checks ferror/feof to accomodate EAGAIN - it might be useful > to replicate it here. > >> + if (size != hashlen) >> + { >> + fprintf (stderr, "Error from read (%zd vs %zd bytes): %s\n", >> + size, hashlen, strerror (errno)); >> + ret = -EIO; > > The fprintf might be problematic: > First, this is a cryptic error message, in the sense that > it indicate an I/O error (hinted by the word "read"), > but is actually a kernel crypto/driver issue. > If a user sees this message, there's no chance they'll know > what the problem is. It's better to be more explicit that > this relates to kernel crypto API problem. > Indeed, it should be reworded better. > Second, all other sha*_stream functions never output anything > to STDERR. Their interface is returning 0 for success, 1 for failure, > and setting errno for failure. Not sure if this is an official API > agreement or just de-facto, but it means that users of sha*_stream > functions should now be aware there might be output. > > Lastly, if the decision is to keep the error message, > it's likely better to use gnulib's error() function, like so: > > error (0, errno, _("some error message %zd"), size); > > Which will take care of strerror, and also enable translation. > I wasn't sure if put the error message or not. I think that a library should not clobber the output with text, probably I did a mistake, returning 1 and set errno seems a better idea. > ---- > > On more general note, while the Crypto API is available since version 2.6.38, > There are still some issues popping here and there. > for example: > https://security-tracker.debian.org/tracker/CVE-2016-8646 > https://bugzilla.redhat.com/show_bug.cgi?id=1395896 > That's bad. I'm not a security expert at all, but I did a quick search and I've found that also libcrypt had some CVEs. Nobody's perfect :) > Another example, the libkcapi [1] library (referenced from the kernel crypto > api documentation) > contains a work-around for a bug in kernels pre-4.9 [2]. > I don't know if this is relevant for your implementation or not, > but if it is, it's worth checking for this and other issues. > [1] http://www.chronox.de/libkcapi.html > [2] > https://github.com/smuellerDD/libkcapi/commit/b8d5941addb15fe5a716eef24060fbd306c06ec9 > That's bad too. In this case probably the distribution kernel maintainers should backport such fix anyway, regardless of af_alg support in coreutils. > It also seems OpenSSL version 1.1.0 added support for AF_ALG. > Perhaps it's better to leave it to them (since gnulib can already > use openssl easily)? > I've just checked the openssl version on my two reference systems which are an arm64 Ubuntu 18.04: $ dpkg -l libssl1.1 |grep ^ii ii libssl1.1:arm64 1.1.0g-2ubuntu4 arm64 and an x86_64 Fedora 27: $ rpm -q openssl-libs openssl-libs-1.1.0h-3.fc27.x86_64 strace revealed that none of them seems to use af_alg, at least in my tests. Probably the package maintainer had to enable it? Regards, -- Matteo Croce per aspera ad upstream