I meant to ask a while ago about vectorized reductions after I saw a
paper that I can't now find.  I didn't understand what was behind it.

Can someone explain why you need to hand-code the avx implementations of
the reduction operations now used on x86_64?  As far as I remember, the
paper didn't justify the effort past alluding to a compiler being unable
to vectorize reductions.  I wonder which compiler(s); the recent ones
I'm familiar with certainly can if you allow them (or don't stop them --
icc, sigh).  I've been assured before that GCC can't, but that's
probably due to using the default correct FP compilation and/or not
restricting function arguments.  So I wonder what's the problem just
using C and a tolerably recent GCC if necessary -- is there something
else behind this?

Since only x86 is supported, I had a go on ppc64le and with minimal
effort saw GCC vectorizing more of the base implementation functions
than are included in the avx version.  Similarly for x86
micro-architectures.  (I'd need convincing that avx512 is worth the
frequency reduction.)  It would doubtless be the same on aarch64, say,
but I only have the POWER.

Thanks for any info.

Reply via email to