Re: [OMPI users] MPI_AllReduce vs MPI_IAllReduce

2015-11-30 Thread Felipe .
Thanks for the reply, Ralph. Now I think it is clearer to me why it could be so much slower. The reason would be that the blocking algorithm for reduction has a implementation very different than the non-blocking. Since there are lots of ways to implement it, are there options to tune the non-blo

Re: [OMPI users] MPI_AllReduce vs MPI_IAllReduce

2015-11-27 Thread Ralph Castain
One thing you might want to keep in mind is that “non-blocking” doesn’t mean “asynchronous progress”. The API may not block, but the communications only progress whenever you actually call down into the library. So if you are calling a non-blocking collective, and then make additional calls int

Re: [OMPI users] MPI_AllReduce vs MPI_IAllReduce

2015-11-27 Thread Felipe .
>Try and do a variable amount of work for every process, I see non-blocking >as a way to speed-up communication if they arrive individually to the call. >Please always have this at the back of your mind when doing this. I tried to simplify the problem at the explanation. The "local_computation" is

Re: [OMPI users] MPI_AllReduce vs MPI_IAllReduce

2015-11-27 Thread Nick Papior
Try and do a variable amount of work for every process, I see non-blocking as a way to speed-up communication if they arrive individually to the call. Please always have this at the back of your mind when doing this. Surely non-blocking has overhead, and if the communication time is low, so will t

[OMPI users] MPI_AllReduce vs MPI_IAllReduce

2015-11-27 Thread Felipe .
Hello! I have a program that basically is (first implementation): for i in N: local_computation(i) mpi_allreduce(in_place, i) In order to try to mitigate the implicit barrier of the mpi_allreduce, I tried to start an mpi_Iallreduce. Like this(second implementation): for i in N: local_comput