Hi all, Thanks for your benchmarking help and explanations.
Let me try to summarize. * We need to consider each of the algorithms md5, sha1 .... sha256 separately, because each algorithm has a different performance characteristic [1]. This is due to the following factors: - Some non-Intel hardware has crypto devices. [2] - Intel hardware has special instructions for special crypto algorithms. [3][4] - The Linux kernel has specially optimized code for specific crypto algorithms. [4] * For the afalg_stream case (with regular files), for all algorithms, kernel crypto is faster than user-space crypto, for sizes N > N_0. Reasons: 1. The sendfile call avoids copying the file data to user-space. 2. The in-kernel crypto code _may_ (or may not) be faster than the plain C code from gnulib. * For the afalg_buffer case (and, btw, also the afalg_stream case with non-regular files), it depends on the algorithm and CPU capabilities: * If the in-kernel crypto code has roughly the same speed as the plain C code from gnulib, then we observe that kernel crypto is always slower than user-space crypto, because of the added overhead of copying the data to kernel space. * If the in-kernel crypto code is faster than the plain C code from gnulib by at least, say, 10%, then kernel crypto is faster than user-space crypto, for sizes N > N_0, because the faster algorithm outweighs the copying the data to kernel space. * The reasons for our disappointment are: - The original presentation [2] was misleading because, as Assaf noticed [5], a large portion of the reported speedup (at least for Intel processors) is due to a test case that 1. is a corner case, 2. exhibits a speedup that is due to sendfile(), not a different crypto implementation. Lesson to be learned: When you present a new feature and motivate it with speedups, please always also include an _average_ use case (i.e. non-sparse files, or memory regions not completely filled with zeroes)! - We all have access to machines with x86_64 CPUs, and only some of them have special crypto instructions. - The system calls have some cost. [6] Bruno [1] https://lists.gnu.org/archive/html/bug-gnulib/2018-05/msg00043.html [2] https://lists.gnu.org/archive/html/bug-gnulib/2018-04/msg00062.html [3] https://en.wikipedia.org/wiki/AES_instruction_set [4] https://lists.gnu.org/archive/html/bug-gnulib/2018-05/msg00038.html [5] https://lists.gnu.org/archive/html/bug-gnulib/2018-04/msg00088.html [6] https://lists.gnu.org/archive/html/bug-gnulib/2018-05/msg00044.html