Re: [OMPI users] vectorized reductions

2021-07-20 Thread Gilles Gouaillardet via users
You are welcome to provide any data that evidences the current implementation (intrinsics, AVX512) is not the most efficient, and you are free to issue a Pull Request in order to suggest a better one. The op/avx component has pretty much nothing to do with scalability: only one node is req

Re: [OMPI users] vectorized reductions

2021-07-20 Thread Dave Love via users
Gilles Gouaillardet via users writes: > One motivation is packaging: a single Open MPI implementation has to be > built, that can run on older x86 processors (supporting only SSE) and the > latest ones (supporting AVX512). I take dispatch on micro-architecture for granted, but it doesn't require

Re: [OMPI users] vectorized reductions

2021-07-19 Thread Gilles Gouaillardet via users
One motivation is packaging: a single Open MPI implementation has to be built, that can run on older x86 processors (supporting only SSE) and the latest ones (supporting AVX512). The op/avx component will select at runtime the most efficient implementation for vectorized reductions. On Mon, Jul 19

[OMPI users] vectorized reductions

2021-07-19 Thread Dave Love via users
I meant to ask a while ago about vectorized reductions after I saw a paper that I can't now find. I didn't understand what was behind it. Can someone explain why you need to hand-code the avx implementations of the reduction operations now used on x86_64? As far as I remember, the paper didn't j