George, Thanks for your rapid answer.
I just ask for "simple synchronized reduction implementation" because I am using a simple (and therefore really rapid) mpi communications simulator that models all collectives as synchronized collectives and I appreciate a huge differences between the real and the simulated execution because the reductions. After note that in reallity there is no case where the mpi_reductions syncrhonizes, then maybe could be a good idea to try to model an approximation to the real behaviour. There is any place where I can found documentation about the different algorithms that are implemented for mpi_reduction? - Fran On Fri, 2016-07-08 at 15:40 +0200, George Bosilca wrote: > > On Jul 8, 2016 3:16 PM, "Juan Francisco Martínez" < > juan.francisco.marti...@est.fib.upc.edu> wrote: > > > > Hi everybody! > > > > First of all I want to congratulate all of you because the quality > of > > the community, I have solved a lot of doubts just reading the > mailing > > list. > > > > However I have a question that I can not solve... Until now I > though > > that all the collective operations have an implicit sincronization, > but > > I can see that this is not true at all (because optimizations?). > Then, > > after searching a little bit on the web I saw that there are > several > > implementations of the reduction in openmpi, in fact there are 6 > > possible algorithm (at least on OMPI 1.6) that you can use by mean > of > > the mca parameters... > > > > I thought that one of them behaves as a synchronization but after > > execute a test with each one, no one behaves as it. Then my > question > > is, there is any possibility, by tuning ompi, the reduce operation > > syncrhonize all the ranks that are involved at the end of the > > operation? > The straightforward answer is that the reduction provides a logical > synchronization only between the root of the reduction and each one > of the participants individually. > As you already noticed this is not the case from a practical > perspective because different underlying algorithms can be used, and > they use different communication patterns. Thus, you cannot, and you > should not, make a parallel between a reduction and a > synchronization. > If you really need the synchronization behavior why don't you use > allreduce instead? Or at least a bcast of a single byte after the > reduction (it also works with a barrier but as already have 1/2 of > the synchronization, aka. all-to-root, this will be an overkill). > > > > Also I would like to know if there is any mechanism to know at > runtime > > which algorithm is being used. > Again, there is no simple answer. Even if the tuned collective > module could expose the algorithm, how do you know that a particular > collective will be using the tuned module? We order the collective > modules by priority, and the decision of which module will handle > each collective is dynamic, based on the many factors. > George > > > > Thanks for all! > > - Fran > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/07/29606.php > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/07/29607.php