You probably have this somewhere below, but what OS are you running? I have CentOS6, and vader works fine for me and is much faster than the sm btl.
I can certainly ask to see if someone has time to fix the knem support - if they do, we would definitely include the fix in the 1.8 series. On Oct 16, 2014, at 4:06 PM, Gus Correa <g...@ldeo.columbia.edu> wrote: > Hi All > > Back to the original issue of knem in Open MPI 1.8.3. > It really seems to be broken. > > I launched the Intel MPI benchmarks (IMB) job both with > '-mca btl ^vader,tcp', and with '-mca btl sm,self,openib'. > Both syntaxes seem to have turned off vader (along with tcp), > as shown in stderr by messages like this > (I also used -mca btl_base_verbose 30): > > [1,11]<stddiag>:[node26:13439] mca: bml: Using sm btl to [[39251,1],0] on > node node26 > > *However*, in both cases /dev/knem continues to *show only zeros*. > > My conclusion is that the knem seems not to be working > at all in OMPI 1.8.3. > > That is a real pity, because without knem performance really suffers. > I took a quick look at the Intel MPI benchmarks output > using OMPI 1.6.5 with knem, and OMPI 1.8.5 where knem doesn't work (despite > my attempts to make it work). > The older OMPI with knem shows very good speedups. > For instance, ping-pong on two processors, message size 256kB, > OMPI 1.6.5+knem has a ~32% speeedup w.r.t. OMPI 1.8.3. > > #bytes #repetitions t[usec] Mbytes/sec > 262144 160 48.04 5203.93 (OMPI 1.6.5 + knem) > 262144 160 63.72 3923.30 (OMPI 1.8.3, broken knem) > > Numbers like these don't give me any incentive to upgrade > our production codes to OMPI 1.8. > Will this be fixed in the next Open MPI 1.8 release? > > Thank you, > Gus Correa > > PS - Many thanks to Aurelien Boutelier for pointing out the existence > of the vader btl. Without his tip I would still be in the dark side. > > On 10/16/2014 05:46 PM, Gus Correa wrote: >> >> On 10/16/2014 05:28 PM, Nathan Hjelm wrote: >>> And it doesn't support knem at this time. Probably never will because of >>> the existence of CMA. >>> >>> -Nathan >>> >> >> Thanks, Nathan >> >> But for the benefit of mere mortals like me >> who don't share the dark or the bright side of the force, >> and just need to keep their MPI applications running in production mode, >> hopefully with Open MPI 1.8, >> can somebody explain more clearly what "vader" is about? >> >> Thank you, >> Gus Correa >> >> >>> On Thu, Oct 16, 2014 at 01:49:09PM -0700, Ralph Castain wrote: >>>> FWIW: vader is the default in 1.8 >>>> >>>> On Oct 16, 2014, at 1:40 PM, Aurélien Bouteiller >>>> <boute...@icl.utk.edu> wrote: >>>> >>>>> Are you sure you are not using the vader BTL ? >>>>> >>>>> Setting mca_btl_base_verbose and/or sm_verbose should spit out some >>>>> knem initialization info. >>>>> >>>>> The CMA linux system (that ships with most 3.1x linux kernels) has >>>>> similar features, and is also supported in sm. >>>>> >>>>> Aurelien >>>>> -- >>>>> ~~~ Aurélien Bouteiller, Ph.D. ~~~ >>>>> ~ Research Scientist @ ICL ~ >>>>> The University of Tennessee, Innovative Computing Laboratory >>>>> 1122 Volunteer Blvd, suite 309, Knoxville, TN 37996 >>>>> tel: +1 (865) 974-9375 fax: +1 (865) 974-8296 >>>>> https://icl.cs.utk.edu/~bouteill/ >>>>> >>>>> >>>>> >>>>> >>>>> Le 16 oct. 2014 à 16:35, Gus Correa <g...@ldeo.columbia.edu> a écrit : >>>>> >>>>>> Dear Open MPI developers >>>>>> >>>>>> Well, I just can't keep my promises for too long ... >>>>>> So, here I am pestering you again, although this time >>>>>> it is not a request for more documentation. >>>>>> Hopefully it is something more legit. >>>>>> >>>>>> I am having trouble using knem with Open MPI 1.8.3, >>>>>> and need your help. >>>>>> >>>>>> I configured Open MPI 1.8.3 with knem. >>>>>> I had done the same with some builds of Open MPI 1.6.5 before. >>>>>> >>>>>> When I build and launch the Intel MPI benchmarks (IMB) >>>>>> with Open MPI 1.6.5, >>>>>> 'cat /dev/knem' >>>>>> starts showing non-zero-and-growing statistics right away. >>>>>> >>>>>> However, when I build and launch IMB with Open MPI 1.8.3, >>>>>> /dev/knem shows only zeros, >>>>>> no statistics growing, nothing. >>>>>> Knem just seems to be completely asleep. >>>>>> >>>>>> So, my conclusion is that somehow knem is not working with OMPI 1.8.3, >>>>>> at least not for me. >>>>>> >>>>>> *** >>>>>> >>>>>> The runtime environment related to knem is setup the >>>>>> same way on both OPMI releases. >>>>>> I tried setting it up both on the command line: >>>>>> >>>>>> -mca btl_sm_eager_limit 32768 -mca btl_sm_knem_dma_min 1048576 >>>>>> >>>>>> and on the MCA parameter file: >>>>>> >>>>>> btl_sm_use_knem = 1 >>>>>> btl_sm_eager_limit = 32768 >>>>>> btl_sm_knem_dma_min = 1048576 >>>>>> >>>>>> and the behavior is the same (i.e., knem is active in 1.6.5, >>>>>> but doesn't seem to be used by 1.8.3, as indicated by the >>>>>> /dev/knem statistics.) >>>>>> >>>>>> *** >>>>>> >>>>>> When I 'grep -i knem config.log', both 1.6.5 and 1.8.3 builds show: >>>>>> >>>>>> #define OMPI_BTL_SM_HAVE_KNEM 1 >>>>>> >>>>>> suggesting that both configurations picked up knem correctly. >>>>>> >>>>>> On the other hand, when I do 'ompi_info --all --all |grep knem', >>>>>> OMPI 1.6.5 shows "btl_sm_have_knem_support": >>>>>> >>>>>> 'MCA btl: information "btl_sm_have_knem_support" (value: <1>, data >>>>>> source: default value) Whether this component supports the knem >>>>>> Linux kernel module or not' >>>>>> >>>>>> By contrast, in OMPI 1.8.3 ompi_info doesn't show this particular >>>>>> item ("btl_sm_have_knem_support"), >>>>>> although the *other* 'btl sm knem' items are there, >>>>>> namely "btl_sm_use_knem","btl_sm_knem_dma_min", >>>>>> "btl_sm_knem_max_simultaneous". >>>>>> >>>>>> I am scratching my head to understand why a parameter with such a >>>>>> suggestive name ("btl_sm_have_knem_support"), >>>>>> so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro, >>>>>> somehow vanished from ompi_info in OMPI 1.8.3. >>>>>> >>>>>> *** >>>>>> >>>>>> Questions: >>>>>> >>>>>> - Am I doing something totally wrong, >>>>>> perhaps with the knem runtime environment? >>>>>> >>>>>> - Was knem somehow phased out in 1.8.3? >>>>>> >>>>>> - Could there be a bad interaction with other runtime parameters that >>>>>> somehow is knocking out knem in 1.8.3? >>>>>> (FYI, besides knem, I'm just excluding the tcp btl, binding to >>>>>> core, and reporting the bindings, which is exactly what I do on 1.6.5, >>>>>> although the runtime parameter syntax has changed.) >>>>>> >>>>>> - Is knem inadvertently not being activated at runtime in OMPI 1.8.3? >>>>>> (i.e. a bug) >>>>>> >>>>>> - Is there a way to increase verbosity to detect if knem is being >>>>>> used by OMPI? >>>>>> That would certainly help to check what is going on. >>>>>> I tried '-mca btl_base_verbose 30' but there was no trace of knem >>>>>> in sderr/stdout of either 1.6.5 or 1.8.3. >>>>>> So, the evidence I have that knem is >>>>>> active in 1.6.5 but not in 1.8.3 comes only from the statistics in >>>>>> /dev/knem. >>>>>> >>>>>> *** >>>>>> >>>>>> >>>>>> Thank you, >>>>>> Gus Correa >>>>>> >>>>>> *** >>>>>> >>>>>> PS - As an aside, I also have some questions on the knem setup, >>>>>> which I mostly copied from the knem web site >>>>>> (hopefully Brice Goglin is listening ...): >>>>>> >>>>>> - Is 32768 in 'btl_sm_eager_limit 32768' a good number, >>>>>> or should it be larger/smaller/something else? >>>>>> [OK, I know I should benchmark it, but exploring the whole parameter >>>>>> space takes long, so why not asking? ] >>>>>> >>>>>> - Is it worth using 'btl_sm_knem_dma_min 1048576'? >>>>>> [I think I read somewhere that this dma engine offload >>>>>> is an Intel thing, not AMD.] >>>>>> >>>>>> - How about btl_sm_knem_max_simultaneous? >>>>>> That one is not mentioned in the knem web site. >>>>>> Should I leave it default to zero or set it to 1? 2? 4? Something >>>>>> else? >>>>>> >>>>>> >>>>>> Thanks again, >>>>>> Gus Correa >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25511.php >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2014/10/25512.php >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/10/25513.php >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/10/25515.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/10/25518.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25519.php