You probably have this somewhere below, but what OS are you running? I have 
CentOS6, and vader works fine for me and is much faster than the sm btl.

I can certainly ask to see if someone has time to fix the knem support - if 
they do, we would definitely include the fix in the 1.8 series.


On Oct 16, 2014, at 4:06 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:

> Hi All
> 
> Back to the original issue of knem in Open MPI 1.8.3.
> It really seems to be broken.
> 
> I launched the Intel MPI benchmarks (IMB) job both with
> '-mca btl ^vader,tcp', and with '-mca btl sm,self,openib'.
> Both syntaxes seem to have turned off vader (along with tcp),
> as shown in stderr by messages like this
> (I also used -mca btl_base_verbose 30):
> 
> [1,11]<stddiag>:[node26:13439] mca: bml: Using sm btl to [[39251,1],0] on 
> node node26
> 
> *However*, in both cases /dev/knem continues to *show only zeros*.
> 
> My conclusion is that the knem seems not to be working
> at all in OMPI 1.8.3.
> 
> That is a real pity, because without knem performance really suffers.
> I took a quick look at the Intel MPI benchmarks output
> using OMPI 1.6.5 with knem, and OMPI 1.8.5 where knem doesn't work (despite 
> my attempts to make it work).
> The older OMPI with knem shows very good speedups.
> For instance, ping-pong on two processors, message size 256kB,
> OMPI 1.6.5+knem has a ~32% speeedup w.r.t. OMPI 1.8.3.
> 
> #bytes #repetitions      t[usec]   Mbytes/sec
> 262144          160        48.04      5203.93 (OMPI 1.6.5 + knem)
> 262144          160        63.72      3923.30 (OMPI 1.8.3, broken knem)
> 
> Numbers like these don't give me any incentive to upgrade
> our production codes to OMPI 1.8.
> Will this be fixed in the next Open MPI 1.8 release?
> 
> Thank you,
> Gus Correa
> 
> PS - Many thanks to Aurelien Boutelier for pointing out the existence
> of the vader btl.  Without his tip I would still be in the dark side.
> 
> On 10/16/2014 05:46 PM, Gus Correa wrote:
>> 
>> On 10/16/2014 05:28 PM, Nathan Hjelm wrote:
>>> And it doesn't support knem at this time. Probably never will because of
>>> the existence of CMA.
>>> 
>>> -Nathan
>>> 
>> 
>> Thanks, Nathan
>> 
>> But for the benefit of mere mortals like me
>> who don't share the dark or the bright side of the force,
>> and just need to keep their MPI applications running in production mode,
>> hopefully with Open MPI 1.8,
>> can somebody explain more clearly what "vader" is about?
>> 
>> Thank you,
>> Gus Correa
>> 
>> 
>>> On Thu, Oct 16, 2014 at 01:49:09PM -0700, Ralph Castain wrote:
>>>> FWIW: vader is the default in 1.8
>>>> 
>>>> On Oct 16, 2014, at 1:40 PM, Aurélien Bouteiller
>>>> <boute...@icl.utk.edu> wrote:
>>>> 
>>>>> Are you sure you are not using the vader BTL ?
>>>>> 
>>>>> Setting mca_btl_base_verbose and/or sm_verbose should spit out some
>>>>> knem initialization info.
>>>>> 
>>>>> The CMA linux system (that ships with most 3.1x linux kernels) has
>>>>> similar features, and is also supported in sm.
>>>>> 
>>>>> Aurelien
>>>>> --
>>>>>          ~~~ Aurélien Bouteiller, Ph.D. ~~~
>>>>>             ~ Research Scientist @ ICL ~
>>>>> The University of Tennessee, Innovative Computing Laboratory
>>>>> 1122 Volunteer Blvd, suite 309, Knoxville, TN 37996
>>>>> tel: +1 (865) 974-9375       fax: +1 (865) 974-8296
>>>>> https://icl.cs.utk.edu/~bouteill/
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Le 16 oct. 2014 à 16:35, Gus Correa <g...@ldeo.columbia.edu> a écrit :
>>>>> 
>>>>>> Dear Open MPI developers
>>>>>> 
>>>>>> Well, I just can't keep my promises for too long ...
>>>>>> So, here I am pestering you again, although this time
>>>>>> it is not a request for more documentation.
>>>>>> Hopefully it is something more legit.
>>>>>> 
>>>>>> I am having trouble using knem with Open MPI 1.8.3,
>>>>>> and need your help.
>>>>>> 
>>>>>> I configured Open MPI 1.8.3 with knem.
>>>>>> I had done the same with some builds of Open MPI 1.6.5 before.
>>>>>> 
>>>>>> When I build and launch the Intel MPI benchmarks (IMB)
>>>>>> with Open MPI 1.6.5,
>>>>>> 'cat /dev/knem'
>>>>>> starts showing non-zero-and-growing statistics right away.
>>>>>> 
>>>>>> However, when I build and launch IMB with Open MPI 1.8.3,
>>>>>> /dev/knem shows only zeros,
>>>>>> no statistics growing, nothing.
>>>>>> Knem just seems to be completely asleep.
>>>>>> 
>>>>>> So, my conclusion is that somehow knem is not working with OMPI 1.8.3,
>>>>>> at least not for me.
>>>>>> 
>>>>>> ***
>>>>>> 
>>>>>> The runtime environment related to knem is setup the
>>>>>> same way on both OPMI releases.
>>>>>> I tried setting it up both on the command line:
>>>>>> 
>>>>>> -mca btl_sm_eager_limit 32768 -mca btl_sm_knem_dma_min 1048576
>>>>>> 
>>>>>> and on the MCA parameter file:
>>>>>> 
>>>>>> btl_sm_use_knem = 1
>>>>>> btl_sm_eager_limit = 32768
>>>>>> btl_sm_knem_dma_min = 1048576
>>>>>> 
>>>>>> and the behavior is the same (i.e., knem is active in 1.6.5,
>>>>>> but doesn't seem to be used by 1.8.3, as indicated by the
>>>>>> /dev/knem statistics.)
>>>>>> 
>>>>>> ***
>>>>>> 
>>>>>> When I 'grep -i knem config.log', both 1.6.5 and 1.8.3 builds show:
>>>>>> 
>>>>>> #define OMPI_BTL_SM_HAVE_KNEM 1
>>>>>> 
>>>>>> suggesting that both configurations picked up knem correctly.
>>>>>> 
>>>>>> On the other hand, when I do 'ompi_info --all --all |grep knem',
>>>>>> OMPI 1.6.5 shows "btl_sm_have_knem_support":
>>>>>> 
>>>>>> 'MCA btl: information "btl_sm_have_knem_support" (value: <1>, data
>>>>>> source: default value)  Whether this component supports the knem
>>>>>> Linux kernel module or not'
>>>>>> 
>>>>>> By contrast, in OMPI 1.8.3 ompi_info doesn't show this particular
>>>>>> item ("btl_sm_have_knem_support"),
>>>>>> although the *other* 'btl sm knem' items are there,
>>>>>> namely "btl_sm_use_knem","btl_sm_knem_dma_min",
>>>>>> "btl_sm_knem_max_simultaneous".
>>>>>> 
>>>>>> I am scratching my head to understand why a parameter with such a
>>>>>> suggestive name ("btl_sm_have_knem_support"),
>>>>>> so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro,
>>>>>> somehow vanished from ompi_info in OMPI 1.8.3.
>>>>>> 
>>>>>> ***
>>>>>> 
>>>>>> Questions:
>>>>>> 
>>>>>> - Am I doing something totally wrong,
>>>>>> perhaps with the knem runtime environment?
>>>>>> 
>>>>>> - Was knem somehow phased out in 1.8.3?
>>>>>> 
>>>>>> - Could there be a bad interaction with other runtime parameters that
>>>>>> somehow is knocking out knem in 1.8.3?
>>>>>> (FYI, besides knem, I'm just excluding the tcp btl, binding to
>>>>>> core, and reporting the bindings, which is exactly what I do on 1.6.5,
>>>>>> although the runtime parameter syntax has changed.)
>>>>>> 
>>>>>> - Is knem inadvertently not being activated at runtime in OMPI 1.8.3?
>>>>>> (i.e. a bug)
>>>>>> 
>>>>>> - Is there a way to increase verbosity to detect if knem is being
>>>>>> used by OMPI?
>>>>>> That would certainly help to check what is going on.
>>>>>> I tried '-mca btl_base_verbose 30' but there was no trace of knem
>>>>>> in sderr/stdout of either 1.6.5 or 1.8.3.
>>>>>> So, the evidence I have that knem is
>>>>>> active in 1.6.5 but not in 1.8.3 comes only from the statistics in
>>>>>> /dev/knem.
>>>>>> 
>>>>>> ***
>>>>>> 
>>>>>> 
>>>>>> Thank you,
>>>>>> Gus Correa
>>>>>> 
>>>>>> ***
>>>>>> 
>>>>>> PS - As an aside, I also have some questions on the knem setup,
>>>>>> which I mostly copied from the knem web site
>>>>>> (hopefully Brice Goglin is listening ...):
>>>>>> 
>>>>>> - Is 32768 in 'btl_sm_eager_limit 32768' a good number,
>>>>>> or should it be larger/smaller/something else?
>>>>>> [OK, I know I should benchmark it, but exploring the whole parameter
>>>>>> space takes long, so why not asking? ]
>>>>>> 
>>>>>> - Is it worth using 'btl_sm_knem_dma_min 1048576'?
>>>>>> [I think I read somewhere that this dma engine offload
>>>>>> is an Intel thing, not AMD.]
>>>>>> 
>>>>>> - How about btl_sm_knem_max_simultaneous?
>>>>>> That one is not mentioned in the knem web site.
>>>>>> Should I leave it default to zero or set it to 1? 2? 4? Something
>>>>>> else?
>>>>>> 
>>>>>> 
>>>>>> Thanks again,
>>>>>> Gus Correa
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25511.php
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2014/10/25512.php
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2014/10/25513.php
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2014/10/25515.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/10/25518.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25519.php

Reply via email to