I want to close the loop on this issue. 1.8.5 will address it in several
ways:

 - knem support in btl/sm has been fixed. A sanity check was disabling
   knem during component registration. I wrote the sanity check before
   the 1.7 release and didn't intend this side-effect.

 - vader now supports xpmem, cma, and knem. The best available
   single-copy mechanism will be used. If multiple single-copy
   mechanisms are available you can select which one you want to use are
   runtime.

More about the vader btl can be found here:
http://blogs.cisco.com/performance/the-vader-shared-memory-transport-in-open-mpi-now-featuring-3-flavors-of-zero-copy/

-Nathan Hjelm
HPC-5, LANL

On Fri, Oct 17, 2014 at 01:02:23PM -0700, Ralph Castain wrote:
>      On Oct 17, 2014, at 12:06 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
>      Hi Jeff
> 
>      Many thanks for looking into this and filing a bug report at 11:16PM!
> 
>      Thanks to Aurelien, Ralph and Nathan for their help and clarifications
>      also.
> 
>      **
> 
>      Related suggestion:
> 
>      Add a note to the FAQ explaining that in OMPI 1.8
>      the new (default) btl is vader (and what it is).
> 
>      It was a real surprise to me.
>      If Aurelien Bouteiller didn't tell me about vader,
>      I might have never realized it even existed.
> 
>      That could be part of one of the already existent FAQs
>      explaining how to select the btl.
> 
>      **
> 
>      Doubts (btl in OMPI 1.8):
> 
>      I still don't understand clearly the meaning and scope of vader
>      being a "default btl".
> 
>    We mean that it has a higher priority than the other shared memory
>    implementation, and so it will be used for intra-node messaging by
>    default.
> 
>      Which is the scope of this default: intra-node btl only perhaps?
> 
>    Yes - strictly intra-node
> 
>      Was there a default btl before vader, and which?
> 
>    The "sm" btl was the default shared memory transport before vader
> 
>      Is vader the intra-node default only (i.e. replaces sm  by default),
> 
>    Yes
> 
>      or does it somehow extend beyond node boundaries, and replaces (or
>      brings in) network btls (openib,tcp,etc) ?
> 
>    Nope - just intra-node
> 
>      If I am running on several nodes, and want to use openib, not tcp,
>      and, say, use vader, what is the right syntax?
> 
>      * nothing (OMPI will figure it out ... but what if you have
>      IB,Ethernet,Myrinet,OpenGM, altogether?)
> 
>    If you have higher-speed connections, we will pick the fastest for
>    inter-node messaging as the "default" since we expect you would want the
>    fastest possible transport.
> 
>      * -mca btl openib (and vader will come along automatically)
> 
>    Among the ones you show, this would indeed be the likely choices (openib
>    and vader)
> 
>      * -mca btl openib,self (and vader will come along automatically)
> 
>    The "self" btl is *always* active as the loopback transport
> 
>      * -mca btl openib,self,vader (because vader is default only for 1-node
>      jobs)
>      * something else (or several alternatives)
> 
>      Whatever happened to the "self" btl in this new context?
>      Gone? Still there?
> 
>      Many thanks,
>      Gus Correa
> 
>      On 10/16/2014 11:16 PM, Jeff Squyres (jsquyres) wrote:
> 
>        On Oct 16, 2014, at 1:35 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
> 
>          and on the MCA parameter file:
> 
>          btl_sm_use_knem = 1
> 
>        I think the logic enforcing this MCA param got broken when we revamped
>        the MCA param system.  :-(
> 
>          I am scratching my head to understand why a parameter with such a
>          suggestive name ("btl_sm_have_knem_support"),
>          so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro,
>          somehow vanished from ompi_info in OMPI 1.8.3.
> 
>        It looks like this MCA param was also dropped when we revamped the MCA
>        system.  Doh!  :-(
> 
>        There's some deep mojo going on that is somehow causing knem to not be
>        used; I'm too tired to understand the logic right now.  I just opened
>        https://github.com/open-mpi/ompi/issues/239 to track this issue --
>        feel free to subscribe to the issue to get updates.
> 
>      _______________________________________________
>      users mailing list
>      us...@open-mpi.org
>      Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>      Link to this
>      post: http://www.open-mpi.org/community/lists/users/2014/10/25532.php

> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25534.php

Attachment: pgp9iM_PC5QYR.pgp
Description: PGP signature

Reply via email to