Dear all,

I would like to ask about collective communication. With debug mode enabled, I can see many info during the execution which algorithm is used etc. But my question is that I would like to use a specific algorithm (the simplest I suppose). I am profiling some applications and I want to simulate them with another program so I must be able to know for example what the mpi_allreduce is doing. I saw many algorithms that depend on the message size and the number of processors, so I would like to ask:

1) what is the way to say at open mpi to use a simple algorithm for allreduce (is there any way to say to use the simplest algorithm for all the collective communication?). Basically I would like to know the root cpu for every collective communication. What are the disadvantages for demanding the simplest algorithm?

2) Is there any overhead because I installed open mpi with debug mode even if I just run a program without any flag with --mca?

3) How you could describe allreduce by words? Can we say that the root cpu does reduce and then broadcast? I mean is that right for your implementation? I saw that it depends on the algorithm which cpu is the root, so is it possible to use an algorithm that I will know every time that cpu with rank 0 is the root?

Thanks a lot,
George

Reply via email to