subject:"\[OMPI users\] vectorized reductions"

Re: [OMPI users] vectorized reductions

2021-07-20 Thread Gilles Gouaillardet via users

You are welcome to provide any data that evidences the current implementation (intrinsics, AVX512) is not the most efficient, and you are free to issue a Pull Request in order to suggest a better one. The op/avx component has pretty much nothing to do with scalability: only one node is req

Re: [OMPI users] vectorized reductions

2021-07-20 Thread Dave Love via users

Gilles Gouaillardet via users writes: > One motivation is packaging: a single Open MPI implementation has to be > built, that can run on older x86 processors (supporting only SSE) and the > latest ones (supporting AVX512). I take dispatch on micro-architecture for granted, but it doesn't require

Re: [OMPI users] vectorized reductions

2021-07-19 Thread Gilles Gouaillardet via users

One motivation is packaging: a single Open MPI implementation has to be built, that can run on older x86 processors (supporting only SSE) and the latest ones (supporting AVX512). The op/avx component will select at runtime the most efficient implementation for vectorized reductions. On Mon, Jul 19

[OMPI users] vectorized reductions

2021-07-19 Thread Dave Love via users

I meant to ask a while ago about vectorized reductions after I saw a paper that I can't now find. I didn't understand what was behind it. Can someone explain why you need to hand-code the avx implementations of the reduction operations now used on x86_64? As far as I remember, the paper didn't j

Re: [OMPI users] vectorized reductions

Re: [OMPI users] vectorized reductions

Re: [OMPI users] vectorized reductions

[OMPI users] vectorized reductions

4 matches

Site Navigation

Mail list logo

Footer information