On Tue, Aug 05, 2025 at 01:49:31PM +0900, Simon Richter wrote: > Hi, > > On 8/5/25 07:59, Eric Biggers wrote: > > > > md5sum uses the kernel's MD5 code: > > > What? That's crazy. Userspace MD5 code would be faster and more > > reliable. No need to make syscalls, transfer data to and from the > > kernel, have an external dependency, etc. Is this the coreutils md5sum? > > We need to get this reported and fixed. > > The userspace API allows zero-copy transfers from userspace, and AFAIK also > directly operating on files without ever transferring the data to userspace > (so we save one copy). > > Userspace requests are also where the asynchronous hardware offload units > get to chomp on large blocks of data while the CPU is doing something else: > > $ time dd if=test.bin of=/dev/zero bs=1G # warm up caches > real 0m1.541s > user 0m0.000s > sys 0m0.732s > > $ time gzip -9 <test.bin >test.bin.gz # compress with the CPU > real 2m57.789s > user 2m55.986s > sys 0m1.508s > > $ time ./gzfht_test test.bin # compress with NEST unit > real 0m3.207s > user 0m0.584s > sys 0m2.487s > > $ time gzip -d <test.bin.nx.gz >test.bin.nx # decompress with CPU > real 1m0.103s > user 0m57.990s > sys 0m1.878s > > $ time ./gunz_test test.bin.gz # decompress with NEST unit > real 0m2.722s > user 0m0.200s > sys 0m1.872s > > That's why I'm objecting to measuring the general usefulness of hardware > crypto units by the standards of fscrypt, which has an artificial limitation > of never submitting blocks larger than 4kB: there are other use cases that > don't have that limitation, and where the overhead is negligible because it > is incurred only once for a few gigabytes of data. > > That's why I suggested changing from a priority field to "speed" and > "overhead" fields, and calculate priority for each application as > (size/speed+overhead) -- smallest number wins, size is what the application > expects to use as the typical request size (which for fscrypt and IPsec is > on the small side, so it would always select the CPU unless there was a > low-overhead offload engine available) > > This probably needs some adjustment to allow selecting a low-power > implementation (e.g. on mobile, I'd want to use offloading for fscrypt even > if it is slower), and model request batching which reduces the overhead in a > busy system, but it should be a good start.
What does this have to do with this thread, which is about the PowerPC optimized MD5 code? - Eric