One motivation is packaging: a single Open MPI implementation has to be
built, that can run on older x86 processors (supporting only SSE) and the
latest ones (supporting AVX512). The op/avx component will select at
runtime the most efficient implementation for vectorized reductions.

On Mon, Jul 19, 2021 at 11:11 PM Dave Love via users <
users@lists.open-mpi.org> wrote:

> I meant to ask a while ago about vectorized reductions after I saw a
> paper that I can't now find.  I didn't understand what was behind it.
>
> Can someone explain why you need to hand-code the avx implementations of
> the reduction operations now used on x86_64?  As far as I remember, the
> paper didn't justify the effort past alluding to a compiler being unable
> to vectorize reductions.  I wonder which compiler(s); the recent ones
> I'm familiar with certainly can if you allow them (or don't stop them --
> icc, sigh).  I've been assured before that GCC can't, but that's
> probably due to using the default correct FP compilation and/or not
> restricting function arguments.  So I wonder what's the problem just
> using C and a tolerably recent GCC if necessary -- is there something
> else behind this?
>
> Since only x86 is supported, I had a go on ppc64le and with minimal
> effort saw GCC vectorizing more of the base implementation functions
> than are included in the avx version.  Similarly for x86
> micro-architectures.  (I'd need convincing that avx512 is worth the
> frequency reduction.)  It would doubtless be the same on aarch64, say,
> but I only have the POWER.
>
> Thanks for any info.
>

Reply via email to