Hello all,

I am currently profiling a simple case where I replace multiple S/R calls with Allgather calls and it would _seem_ the simple S/R calls are faster. Now, *before* I come to any conclusion on this, one of the pieces I am missing is more details on how /if/when the tuned coll MCA is selected. In other words, can I assume the tuned versions are used by default? I skimmed through the well documented source code but before I can even start to analyze the replacement's impact (in a small cluster), I need to know how and when the tuned coll MCA is used/selected.

Thanks,

Eric

Reply via email to