Hi Jeff
Many thanks for looking into this and filing a bug report at 11:16PM!
Thanks to Aurelien, Ralph and Nathan for their help and clarifications
also.
**
Related suggestion:
Add a note to the FAQ explaining that in OMPI 1.8
the new (default) btl is vader (and what it is).
It was a real surprise to me.
If Aurelien Bouteiller didn't tell me about vader,
I might have never realized it even existed.
That could be part of one of the already existent FAQs
explaining how to select the btl.
**
Doubts (btl in OMPI 1.8):
I still don't understand clearly the meaning and scope of vader
being a "default btl".
Which is the scope of this default: intra-node btl only perhaps?
Was there a default btl before vader, and which?
Is vader the intra-node default only (i.e. replaces sm by default),
or does it somehow extend beyond node boundaries, and replaces (or
brings in) network btls (openib,tcp,etc) ?
If I am running on several nodes, and want to use openib, not tcp,
and, say, use vader, what is the right syntax?
* nothing (OMPI will figure it out ... but what if you have
IB,Ethernet,Myrinet,OpenGM, altogether?)
* -mca btl openib (and vader will come along automatically)
* -mca btl openib,self (and vader will come along automatically)
* -mca btl openib,self,vader (because vader is default only for 1-node jobs)
* something else (or several alternatives)
Whatever happened to the "self" btl in this new context?
Gone? Still there?
Many thanks,
Gus Correa
On 10/16/2014 11:16 PM, Jeff Squyres (jsquyres) wrote:
On Oct 16, 2014, at 1:35 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
and on the MCA parameter file:
btl_sm_use_knem = 1
I think the logic enforcing this MCA param got broken when we revamped the MCA
param system. :-(
I am scratching my head to understand why a parameter with such a
suggestive name ("btl_sm_have_knem_support"),
so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro,
somehow vanished from ompi_info in OMPI 1.8.3.
It looks like this MCA param was also dropped when we revamped the MCA system.
Doh! :-(
There's some deep mojo going on that is somehow causing knem to not be used;
I'm too tired to understand the logic right now. I just opened
https://github.com/open-mpi/ompi/issues/239 to track this issue -- feel free to
subscribe to the issue to get updates.