What can we do to get the speedups but avoid the slowdowns, in the two
hairy cases
- the afalg_buffer case,
- the afalg_stream case with non-regular files?
The following approaches come to mind:
* A "tuning" framework like the one from GMP. This is a set of benchmark
programs that the developers use to determine the break-even points
on a platform and write them into a platform-specific gmp-param.h file.
Drawback: Who will have the time (and resources) to do this for the
hundreds of x86_64 CPUs and the dozens of ARMs CPUs on the market?
* A configure test that compares the speed of the two implementations
and sets a flag in config.h accordingly.
Drawbacks:
- This does not solve the issue for programs distributed through a Linux
distributor.
- The outcome of this configure test may depend on the load of the machine
at the moment 'configure' runs.
- Changes in the kernel (which are likely to arrive due to Meltdown,
Spectre, and Spectre-NG fixes) will affect these comparisons.
- It goes against the goals of "reproducible builds".
* Use the kernel-provided meta-info about the algorithms to decide whether
to use the kernel API.
In detail: Read /proc/crypto at run time. It consists of records
with fields (name, driver, module, priority, internal, type).
- Consider only the records for the names we are interested in.
- Eliminate records with module != "kernel" (since we are not ready to
handle the situation that the module gets unloaded while af_alg.c
iterates on the data).
- No need to eliminate records with internal = "yes", since these have a
name that starts with '__', thus are already eliminated.
- Eliminate records with priority <= 100, because these are the generic
implementations, that provide no significant speedup compared to the
generic C implementation in gnulib (assuming the gnulib code was compiled
with -O2).
Drawback: Does not work if /proc is not mounted.
I would favour the third approach.
Bruno