Gilles Gouaillardet via users writes:
> One motivation is packaging: a single Open MPI implementation has to be
> built, that can run on older x86 processors (supporting only SSE) and the
> latest ones (supporting AVX512).
I take dispatch on micro-architecture for granted, but it doesn't
require
You are welcome to provide any data that evidences the current
implementation
(intrinsics, AVX512) is not the most efficient, and you are free to
issue a Pull Request
in order to suggest a better one.
The op/avx component has pretty much nothing to do with scalability:
only one node is req