George,

Thanks for your rapid answer. 

I just ask for "simple synchronized reduction implementation" because I
am using a simple (and therefore really rapid) mpi communications
simulator that models all collectives as synchronized collectives and I
appreciate a huge differences between the real and the simulated
execution because the reductions. 

After note that in reallity there is no case where the mpi_reductions
syncrhonizes, then maybe could be a good idea to try to model an
approximation to the real behaviour.

There is any place where I can found documentation about the different
algorithms that are implemented for mpi_reduction?

- Fran

On Fri, 2016-07-08 at 15:40 +0200, George Bosilca wrote:
> 
> On Jul 8, 2016 3:16 PM, "Juan Francisco Martínez" <
> juan.francisco.marti...@est.fib.upc.edu> wrote:
> >
> > Hi everybody!
> >
> > First of all I want to congratulate all of you because the quality
> of
> > the community, I have solved a lot of doubts just reading the
> mailing
> > list.
> >
> > However I have a question that I can not solve... Until now I
> though
> > that all the collective operations have an implicit sincronization,
> but
> > I can see that this is not true at all (because optimizations?).
> Then,
> > after searching a little bit on the web I saw that there are
> several
> > implementations of the reduction in openmpi, in fact there are 6
> > possible algorithm (at least on OMPI 1.6) that you can use by mean
> of
> > the mca parameters...
> >
> > I thought that one of them behaves as a synchronization but after
> > execute a test with each one, no one behaves as it. Then my
> question
> > is, there is any possibility, by tuning ompi, the reduce operation
> > syncrhonize all the ranks that are involved at the end of the
> > operation?
> The straightforward answer is that the reduction provides a logical
> synchronization only between the root of the reduction and each one
> of the participants individually.
> As you already noticed this is not the case from a practical
> perspective because different underlying algorithms can be used,  and
> they use different communication patterns. Thus, you cannot,  and you
> should not, make a parallel between a reduction and a
> synchronization.
> If you really need the synchronization behavior why don't you use
> allreduce instead? Or at least a bcast of a single byte after the
> reduction (it also works with a barrier but as already have 1/2 of
> the synchronization, aka. all-to-root, this will be an overkill).
> >
> > Also I would like to know if there is any mechanism to know at
> runtime
> > which algorithm is being used.
> Again,  there is no simple answer. Even if the tuned collective
> module could expose the algorithm, how do you know that a particular
> collective will be using the tuned module? We order the collective
> modules by priority,  and the decision of which module will handle
> each collective is dynamic, based on the many factors.
> George
> >
> > Thanks for all!
> > - Fran
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/07/29606.php
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/07/29607.php

Reply via email to