On Oct 17, 2014, at 12:06 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
Hi Jeff
Many thanks for looking into this and filing a bug report at 11:16PM!
Thanks to Aurelien, Ralph and Nathan for their help and clarifications
also.
**
Related suggestion:
Add a note to the FAQ explaining that in OMPI 1.8
the new (default) btl is vader (and what it is).
It was a real surprise to me.
If Aurelien Bouteiller didn't tell me about vader,
I might have never realized it even existed.
That could be part of one of the already existent FAQs
explaining how to select the btl.
**
Doubts (btl in OMPI 1.8):
I still don't understand clearly the meaning and scope of vader
being a "default btl".
We mean that it has a higher priority than the other shared memory
implementation, and so it will be used for intra-node messaging by
default.
Which is the scope of this default: intra-node btl only perhaps?
Yes - strictly intra-node
Was there a default btl before vader, and which?
The "sm" btl was the default shared memory transport before vader
Is vader the intra-node default only (i.e. replaces sm by default),
Yes
or does it somehow extend beyond node boundaries, and replaces (or
brings in) network btls (openib,tcp,etc) ?
Nope - just intra-node
If I am running on several nodes, and want to use openib, not tcp,
and, say, use vader, what is the right syntax?
* nothing (OMPI will figure it out ... but what if you have
IB,Ethernet,Myrinet,OpenGM, altogether?)
If you have higher-speed connections, we will pick the fastest for
inter-node messaging as the "default" since we expect you would want the
fastest possible transport.
* -mca btl openib (and vader will come along automatically)
Among the ones you show, this would indeed be the likely choices (openib
and vader)
* -mca btl openib,self (and vader will come along automatically)
The "self" btl is *always* active as the loopback transport
* -mca btl openib,self,vader (because vader is default only for 1-node
jobs)
* something else (or several alternatives)
Whatever happened to the "self" btl in this new context?
Gone? Still there?
Many thanks,
Gus Correa
On 10/16/2014 11:16 PM, Jeff Squyres (jsquyres) wrote:
On Oct 16, 2014, at 1:35 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
and on the MCA parameter file:
btl_sm_use_knem = 1
I think the logic enforcing this MCA param got broken when we revamped
the MCA param system. :-(
I am scratching my head to understand why a parameter with such a
suggestive name ("btl_sm_have_knem_support"),
so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro,
somehow vanished from ompi_info in OMPI 1.8.3.
It looks like this MCA param was also dropped when we revamped the MCA
system. Doh! :-(
There's some deep mojo going on that is somehow causing knem to not be
used; I'm too tired to understand the logic right now. I just opened
https://github.com/open-mpi/ompi/issues/239 to track this issue --
feel free to subscribe to the issue to get updates.