Re: [OMPI users] using specific algorithm for collective communication, and knowing the root cpu?

George Bosilca Tue, 3 Nov 2009 12:09:28 -0500

You can add the following MCA parameters either on the command line orin the $(HOME)/.openmpi/mca-params.conf file.


On Nov 2, 2009, at 08:52 , George Markomanolis wrote:

Dear all,
I would like to ask about collective communication. With debug modeenabled, I can see many info during the execution which algorithm isused etc. But my question is that I would like to use a specificalgorithm (the simplest I suppose). I am profiling some applicationsand I want to simulate them with another program so I must be ableto know for example what the mpi_allreduce is doing. I saw manyalgorithms that depend on the message size and the number ofprocessors, so I would like to ask:
1) what is the way to say at open mpi to use a simple algorithm forallreduce (is there any way to say to use the simplest algorithm forall the collective communication?). Basically I would like to knowthe root cpu for every collective communication. What are thedisadvantages for demanding the simplest algorithm?

coll_tuned_use_dynamic_rules=1 to allow you to manually set thealgorithms to be used.coll_tuned_allreduce_algorithm=*something between 0 and 5* to describethe algorithm to be user. For the simplest algorithm I guess you willwant to use 1 (star based fan-in fan-out).

The main disadvantage is that the cost of the allreduce will raisewhich will negatively impact the overall performance of the application.

2) Is there any overhead because I installed open mpi with debugmode even if I just run a program without any flag with --mca?

There are many overhead because you compile in debug mode. We do a lotof extra tracking of internally allocate memory, checks on most/allinternal objects and so on. Based on previous results I would say yourlatency increase by about 2-3 micro-secs, but the impact on thebandwidth is minimal.

3) How you could describe allreduce by words? Can we say that theroot cpu does reduce and then broadcast? I mean is that right foryour implementation? I saw that it depends on the algorithm whichcpu is the root, so is it possible to use an algorithm that I willknow every time that cpu with rank 0 is the root?

Exactly, allreduce = reduce + bcast (and btw this is what thealgorithm basic will do). However, there is no root in an allreduce asall processors execute symmetric work. Of course if one see theallreduce as a reduce followed by a broadcast then one has to select aroot (I guess we pick the rank 0 in our implementation).


  george.


Thanks a lot,
George
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] using specific algorithm for collective communication, and knowing the root cpu?

Reply via email to