Re: [PATCH v2 1/4] sha1sum: use AF_ALG when available

Matteo Croce Wed, 25 Apr 2018 18:26:46 -0700

On Wed, Apr 25, 2018 at 7:34 PM, Assaf Gordon <assafgor...@gmail.com> wrote:
> Hello Matteo,
>
> On Wed, Apr 25, 2018 at 01:26:08PM +0200, Matteo Croce wrote:
>> Linux supports accessing kernel crypto API via AF_ALG since
>> version 2.6.38. Coreutils uses libcrypto when available and fallbacks to
>> generic C implementation of various hashing functions.
>
> Not exactly: coreutils (and every project which uses gnulib's md5/sha* 
> modules)
> defaults to using the generic C implementation.
> gnulib automatically adds the option "--with-openssl" to "./configure",
> which can be set explicitly to 'yes' or 'auto'.
>
> To be conservative, I would suggest following the same method:
> Add a "configure" flag instead of enabling a completely new interface
> by default for every downstream project,
> especially that it is tied to external components (the kernel / modules / 
> etc).
>
> If a distribution wants to enable by default, they can do it
> for their package (and ensure it works well with their kernel).
>


Hi Assaf,
I will have a look at the autotools to add such option to configure
and check for something like HAVE_LINUX_CRYPTO

>> Use afalg_stream() only in sha1sum for now, but other hashes are possible.
>> The speed gain really depends on the CPU type, on systems which doesn't use
>> libcrypto ranges from ~10% to 320%.
>
> It would be beneficial to know the improvements against coreutils+libcrypto
> on the same CPUs, because if users build with "--openssl=yes" (which is 
> currently
> considered the "fast" implementation), they would not benefit from your patch.
> So they'll need to decide which one is preferable (and OpenSSL/libreSSL is 
> faster
> on more CPUs and OSs, not just linux on a subset of CPUs).
>

I did comparations against libcrypto too, the results depends on the
CPU and file size but generally libcrypto and af_alg are paragonable.
Of course they are way faster than the generic C implementation.
on x86_64 and file smaller than 2 GB af_alg is slightly faster than
libcrypto, probably because of the sendfile usage:

$ grep -m1 '^model name' /proc/cpuinfo
model name      : Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
$ uname -a
Linux turbo 4.16.2 #71 SMP Sat Apr 21 20:14:08 CEST 2018 x86_64 x86_64
x86_64 GNU/Linux

$ time ./sha1sum-plain 2g.bin
752ef2367f479e79e4f0cded9c270c2890506ab0  2g.bin

real    0m2,718s
user    0m2,548s
sys     0m0,170s

$ time ./sha1sum-libcrypto 2g.bin
752ef2367f479e79e4f0cded9c270c2890506ab0  2g.bin

real    0m1,707s
user    0m1,538s
sys     0m0,170s

$ time ./sha1sum-afalg 2g.bin
752ef2367f479e79e4f0cded9c270c2890506ab0  2g.bin

real    0m1,680s
user    0m0,000s
sys     0m1,668s

on x86_64 and huge files, af_alg is slightly slower than libcrypto,
probably because two syscalls are needed per block:

$ time ./sha1sum-plain 16g.bin
6b9df4760e7308efe6d9def1293b1653ff24ace1  16g.bin

real    0m22,410s
user    0m20,159s
sys     0m2,250s
$ time ./sha1sum-libcrypto 16g.bin
6b9df4760e7308efe6d9def1293b1653ff24ace1  16g.bin

real    0m13,873s
user    0m12,263s
sys     0m1,610s
$ time ./sha1sum-afalg 16g.bin
6b9df4760e7308efe6d9def1293b1653ff24ace1  16g.bin

real    0m14,124s
user    0m0,080s
sys     0m14,044s


I have similar results on an arm64 board running a quad core Cortex
A72, small file:

$ cat /proc/cpuinfo
processor       : 0
BogoMIPS        : 50.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 1

$ uname -a
Linux macchiatobin 4.16.0 #8 SMP Sat Apr 21 17:55:12 UTC 2018 aarch64
aarch64 aarch64 GNU/Linux

$ time ./sha1sum-plain 2g.bin
752ef2367f479e79e4f0cded9c270c2890506ab0  2g.bin

real    0m9.348s
user    0m8.176s
sys     0m1.171s
$ time ./sha1sum-libcrypto 2g.bin
752ef2367f479e79e4f0cded9c270c2890506ab0  2g.bin

real    0m2.232s
user    0m1.721s
sys     0m0.510s
$ time ./sha1sum-afalg 2g.bin
752ef2367f479e79e4f0cded9c270c2890506ab0  2g.bin

real    0m2.219s
user    0m0.000s
sys     0m2.219s

and a bigger one:

$ time ./sha1sum-plain 16g.bin
6b9df4760e7308efe6d9def1293b1653ff24ace1  16g.bin

real    1m14.968s
user    1m5.860s
sys     0m9.106s

$ time ./sha1sum-libcrypto 16g.bin
6b9df4760e7308efe6d9def1293b1653ff24ace1  16g.bin

real    0m20.706s
user    0m14.674s
sys     0m6.001s

$ time ./sha1sum-afalg 16g.bin
6b9df4760e7308efe6d9def1293b1653ff24ace1  16g.bin

real    0m21.608s
user    0m0.142s
sys     0m21.462s

Unfortunately I don't have a board with a crypto HW engine available,
but given the speed of such chips, the results will vary a lot in
favour of af_alg.
Also, keep in mind that some distribution don't want or can't use
libssl for some reasons (licensing, firmware footprint, etc.)

>> This is a test on a Intel(R) Xeon(R) CPU E3-1265L V2 and Debian stretch:
>>
>>     $ truncate -s 2GB 2g.bin
>>     $ time sha1sum 2g.bin
>>     752ef2367f479e79e4f0cded9c270c2890506ab0  2g.bin
>>
>>     real    0m4.829s
>>     user    0m4.437s
>>     sys     0m0.391s
>>     $ time ./sha1sum-afalg 2g.bin
>>     752ef2367f479e79e4f0cded9c270c2890506ab0  2g.bin
>>
>>     real    0m3.164s
>>     user    0m0.000s
>>     sys     0m3.162s
>
> Regarding the above measurement:
> On most modern filesystems "truncate -s" would create a sparse file,
> requiring very little I/O.
> And if the measurements above are indeed mostly CPU-bound
> and not IO-bound, it indicates that hasing in the kernel
> is only somewhat faster then in user-mode, but not significantly so
> if you exclude I/O (not as dramatic difference as measured in
> your previous email).
>

That was intended behaviour. Reading from a slow disk would have
reported similar times between versions.

> This also brings the question: in the previous emails you report
> the improved *-afalg versions after the default versions - did
> you clear the cache to exclude I/O effects?
> (e.g. "sync; echo 3 > /proc/sys/vm/drop_caches")
>

as before, that would clear the block cache and the I/O could be a
bottleneck, probably not what we want.

>> +    {
>> +      /* sendfile() not possible, do a classic read-write loop */
>> +      while ((size = fread (buf, 1, sizeof (buf), stream)))
>> +        {
>> +          if (send (ofd, buf, size, size == sizeof (buf) ? MSG_MORE : 0) == 
>> -1)
>> +            {
>
> Few comments:
> 1. In the above loop, wouldn't a file that's an exact multiple of BLOCKSIZE
> never be sent 0 as the 4th paramater?
> And if so, assuming hashing still works because "MSG_MORE" is just a hint,
> why not always send MSG_MORE ?
>

I tried it before posting the patch, it seems that doing the last
write with MSG_MORE still generates a valid hash. As you guessed, it's
just an hint:

$ dd status=none if=/dev/zero bs=32k count=4 |sha1sum
67dfd19f3eb3649d6f3f6631e44d0bd36b8d8d19  -
$ dd status=none if=/dev/zero bs=32k count=4 |strace -e trace=%network
./sha1sum-afalg
socket(AF_ALG, SOCK_SEQPACKET, 0)       = 3
bind(3, {sa_family=AF_ALG,
sa_data="hash\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0sha1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"},
88) = 0
accept(3, NULL, NULL)                   = 4
sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
32768, MSG_MORE, NULL, 0) = 32768
sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
32768, MSG_MORE, NULL, 0) = 32768
sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
32768, MSG_MORE, NULL, 0) = 32768
sendto(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
32768, MSG_MORE, NULL, 0) = 32768
67dfd19f3eb3649d6f3f6631e44d0bd36b8d8d19  -
+++ exited with 0 +++

Yes, we can always pass MSG_MORE to the kernel, the hash will
correctly generated when calling read()

> 2. The code for sha*_stream() functions is a bit more elaborate,
> and checks ferror/feof to accomodate EAGAIN - it might be useful
> to replicate it here.
>
>> +  if (size != hashlen)
>> +    {
>> +      fprintf (stderr, "Error from read (%zd vs %zd bytes): %s\n",
>> +               size, hashlen, strerror (errno));
>> +      ret = -EIO;
>
> The fprintf might be problematic:
> First, this is a cryptic error message, in the sense that
> it indicate an I/O error (hinted by the word "read"),
> but is actually a kernel crypto/driver issue.
> If a user sees this message, there's no chance they'll know
> what the problem is. It's better to be more explicit that
> this relates to kernel crypto API problem.
>

Indeed, it should be reworded better.

> Second, all other sha*_stream functions never output anything
> to STDERR. Their interface is returning 0 for success, 1 for failure,
> and setting errno for failure. Not sure if this is an official API
> agreement or just de-facto, but it means that users of sha*_stream
> functions should now be aware there might be output.
>
> Lastly, if the decision is to keep the error message,
> it's likely better to use gnulib's error() function, like so:
>
>   error (0, errno, _("some error message %zd"), size);
>
> Which will take care of strerror, and also enable translation.
>

I wasn't sure if put the error message or not. I think that a library
should not clobber the output with text, probably I did a mistake,
returning 1 and set errno seems a better idea.

> ----
>
> On more general note, while the Crypto API is available since version 2.6.38,
> There are still some issues popping here and there.
> for example:
>   https://security-tracker.debian.org/tracker/CVE-2016-8646
>   https://bugzilla.redhat.com/show_bug.cgi?id=1395896
>

That's bad. I'm not a security expert at all, but I did a quick search
and I've found that also libcrypt had some CVEs.
Nobody's perfect :)

> Another example, the libkcapi [1] library (referenced from the kernel crypto 
> api documentation)
> contains a work-around for a bug in kernels pre-4.9 [2].
> I don't know if this is relevant for your implementation or not,
> but if it is, it's worth checking for this and other issues.
>   [1] http://www.chronox.de/libkcapi.html
>   [2] 
> https://github.com/smuellerDD/libkcapi/commit/b8d5941addb15fe5a716eef24060fbd306c06ec9
>

That's bad too. In this case probably the distribution kernel
maintainers should backport such fix anyway, regardless of af_alg
support in coreutils.

> It also seems OpenSSL version 1.1.0 added support for AF_ALG.
> Perhaps it's better to leave it to them (since gnulib can already
> use openssl easily)?
>

I've just checked the openssl version on my two reference systems
which are an arm64 Ubuntu 18.04:

$ dpkg -l libssl1.1 |grep ^ii
ii  libssl1.1:arm64 1.1.0g-2ubuntu4 arm64

and an x86_64 Fedora 27:

$ rpm -q openssl-libs
openssl-libs-1.1.0h-3.fc27.x86_64

strace revealed that none of them seems to use af_alg, at least in my
tests. Probably the package maintainer had to enable it?

Regards,
-- 
Matteo Croce
per aspera ad upstream

Re: [PATCH v2 1/4] sha1sum: use AF_ALG when available

Reply via email to