What can we do to get the speedups but avoid the slowdowns, in the two hairy cases - the afalg_buffer case, - the afalg_stream case with non-regular files?
The following approaches come to mind: * A "tuning" framework like the one from GMP. This is a set of benchmark programs that the developers use to determine the break-even points on a platform and write them into a platform-specific gmp-param.h file. Drawback: Who will have the time (and resources) to do this for the hundreds of x86_64 CPUs and the dozens of ARMs CPUs on the market? * A configure test that compares the speed of the two implementations and sets a flag in config.h accordingly. Drawbacks: - This does not solve the issue for programs distributed through a Linux distributor. - The outcome of this configure test may depend on the load of the machine at the moment 'configure' runs. - Changes in the kernel (which are likely to arrive due to Meltdown, Spectre, and Spectre-NG fixes) will affect these comparisons. - It goes against the goals of "reproducible builds". * Use the kernel-provided meta-info about the algorithms to decide whether to use the kernel API. In detail: Read /proc/crypto at run time. It consists of records with fields (name, driver, module, priority, internal, type). - Consider only the records for the names we are interested in. - Eliminate records with module != "kernel" (since we are not ready to handle the situation that the module gets unloaded while af_alg.c iterates on the data). - No need to eliminate records with internal = "yes", since these have a name that starts with '__', thus are already eliminated. - Eliminate records with priority <= 100, because these are the generic implementations, that provide no significant speedup compared to the generic C implementation in gnulib (assuming the gnulib code was compiled with -O2). Drawback: Does not work if /proc is not mounted. I would favour the third approach. Bruno