I want to close the loop on this issue. 1.8.5 will address it in several ways:
- knem support in btl/sm has been fixed. A sanity check was disabling knem during component registration. I wrote the sanity check before the 1.7 release and didn't intend this side-effect. - vader now supports xpmem, cma, and knem. The best available single-copy mechanism will be used. If multiple single-copy mechanisms are available you can select which one you want to use are runtime. More about the vader btl can be found here: http://blogs.cisco.com/performance/the-vader-shared-memory-transport-in-open-mpi-now-featuring-3-flavors-of-zero-copy/ -Nathan Hjelm HPC-5, LANL On Fri, Oct 17, 2014 at 01:02:23PM -0700, Ralph Castain wrote: > On Oct 17, 2014, at 12:06 PM, Gus Correa <g...@ldeo.columbia.edu> wrote: > Hi Jeff > > Many thanks for looking into this and filing a bug report at 11:16PM! > > Thanks to Aurelien, Ralph and Nathan for their help and clarifications > also. > > ** > > Related suggestion: > > Add a note to the FAQ explaining that in OMPI 1.8 > the new (default) btl is vader (and what it is). > > It was a real surprise to me. > If Aurelien Bouteiller didn't tell me about vader, > I might have never realized it even existed. > > That could be part of one of the already existent FAQs > explaining how to select the btl. > > ** > > Doubts (btl in OMPI 1.8): > > I still don't understand clearly the meaning and scope of vader > being a "default btl". > > We mean that it has a higher priority than the other shared memory > implementation, and so it will be used for intra-node messaging by > default. > > Which is the scope of this default: intra-node btl only perhaps? > > Yes - strictly intra-node > > Was there a default btl before vader, and which? > > The "sm" btl was the default shared memory transport before vader > > Is vader the intra-node default only (i.e. replaces sm by default), > > Yes > > or does it somehow extend beyond node boundaries, and replaces (or > brings in) network btls (openib,tcp,etc) ? > > Nope - just intra-node > > If I am running on several nodes, and want to use openib, not tcp, > and, say, use vader, what is the right syntax? > > * nothing (OMPI will figure it out ... but what if you have > IB,Ethernet,Myrinet,OpenGM, altogether?) > > If you have higher-speed connections, we will pick the fastest for > inter-node messaging as the "default" since we expect you would want the > fastest possible transport. > > * -mca btl openib (and vader will come along automatically) > > Among the ones you show, this would indeed be the likely choices (openib > and vader) > > * -mca btl openib,self (and vader will come along automatically) > > The "self" btl is *always* active as the loopback transport > > * -mca btl openib,self,vader (because vader is default only for 1-node > jobs) > * something else (or several alternatives) > > Whatever happened to the "self" btl in this new context? > Gone? Still there? > > Many thanks, > Gus Correa > > On 10/16/2014 11:16 PM, Jeff Squyres (jsquyres) wrote: > > On Oct 16, 2014, at 1:35 PM, Gus Correa <g...@ldeo.columbia.edu> wrote: > > and on the MCA parameter file: > > btl_sm_use_knem = 1 > > I think the logic enforcing this MCA param got broken when we revamped > the MCA param system. :-( > > I am scratching my head to understand why a parameter with such a > suggestive name ("btl_sm_have_knem_support"), > so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro, > somehow vanished from ompi_info in OMPI 1.8.3. > > It looks like this MCA param was also dropped when we revamped the MCA > system. Doh! :-( > > There's some deep mojo going on that is somehow causing knem to not be > used; I'm too tired to understand the logic right now. I just opened > https://github.com/open-mpi/ompi/issues/239 to track this issue -- > feel free to subscribe to the issue to get updates. > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this > post: http://www.open-mpi.org/community/lists/users/2014/10/25532.php > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25534.php
pgp9iM_PC5QYR.pgp
Description: PGP signature