On 10/16/2014 04:49 PM, Ralph Castain wrote:
> FWIW: vader is the default in 1.8
Yes, Ralph, thank you, I just noticed it in my job's stderr,
after Aurelien pointed out that new "vader" thing existed.
What a quick promotion: from inexistent to default btl!
But what is "vader" after all?
Any pointers, links, ... eh ... documentation (oops, I said I would not
ask for it ...)
There is nothing about "vader" in the FAQ.
How does it affect the other btl?
Is it some kind of "wrapper btl" that decides which of the lower
level btls to use (sm, openib, etc)?
How does it affect knem?
What are vader's pros/cons w.r.t. using the other btls?
In which conditions is it good or bad to use it vs. the other btls?
What do I gain/lose if I do "btl = sm,self,openib"
(which presumably will knock off tcp and "vader'),
or maybe "btl=^tcp,^vader" ?
Many thanks,
Gus Correa
> On Oct 16, 2014, at 1:40 PM, Aurélien Bouteiller
<boute...@icl.utk.edu> wrote:
Are you sure you are not using the vader BTL ?
Setting mca_btl_base_verbose and/or sm_verbose should spit out some knem
initialization info.
The CMA linux system (that ships with most 3.1x linux kernels) has similar
features, and is also supported in sm.
Aurelien
--
~~~ Aurélien Bouteiller, Ph.D. ~~~
~ Research Scientist @ ICL ~
The University of Tennessee, Innovative Computing Laboratory
1122 Volunteer Blvd, suite 309, Knoxville, TN 37996
tel: +1 (865) 974-9375 fax: +1 (865) 974-8296
https://icl.cs.utk.edu/~bouteill/
Le 16 oct. 2014 à 16:35, Gus Correa <g...@ldeo.columbia.edu> a écrit :
Dear Open MPI developers
Well, I just can't keep my promises for too long ...
So, here I am pestering you again, although this time
it is not a request for more documentation.
Hopefully it is something more legit.
I am having trouble using knem with Open MPI 1.8.3,
and need your help.
I configured Open MPI 1.8.3 with knem.
I had done the same with some builds of Open MPI 1.6.5 before.
When I build and launch the Intel MPI benchmarks (IMB)
with Open MPI 1.6.5,
'cat /dev/knem'
starts showing non-zero-and-growing statistics right away.
However, when I build and launch IMB with Open MPI 1.8.3,
/dev/knem shows only zeros,
no statistics growing, nothing.
Knem just seems to be completely asleep.
So, my conclusion is that somehow knem is not working with OMPI 1.8.3,
at least not for me.
***
The runtime environment related to knem is setup the
same way on both OPMI releases.
I tried setting it up both on the command line:
-mca btl_sm_eager_limit 32768 -mca btl_sm_knem_dma_min 1048576
and on the MCA parameter file:
btl_sm_use_knem = 1
btl_sm_eager_limit = 32768
btl_sm_knem_dma_min = 1048576
and the behavior is the same (i.e., knem is active in 1.6.5,
but doesn't seem to be used by 1.8.3, as indicated by the
/dev/knem statistics.)
***
When I 'grep -i knem config.log', both 1.6.5 and 1.8.3 builds show:
#define OMPI_BTL_SM_HAVE_KNEM 1
suggesting that both configurations picked up knem correctly.
On the other hand, when I do 'ompi_info --all --all |grep knem',
OMPI 1.6.5 shows "btl_sm_have_knem_support":
'MCA btl: information "btl_sm_have_knem_support" (value: <1>, data source:
default value) Whether this component supports the knem Linux kernel module or not'
By contrast, in OMPI 1.8.3 ompi_info doesn't show this particular item
("btl_sm_have_knem_support"),
although the *other* 'btl sm knem' items are there,
namely "btl_sm_use_knem","btl_sm_knem_dma_min", "btl_sm_knem_max_simultaneous".
I am scratching my head to understand why a parameter with such a
suggestive name ("btl_sm_have_knem_support"),
so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro,
somehow vanished from ompi_info in OMPI 1.8.3.
***
Questions:
- Am I doing something totally wrong,
perhaps with the knem runtime environment?
- Was knem somehow phased out in 1.8.3?
- Could there be a bad interaction with other runtime parameters that
somehow is knocking out knem in 1.8.3?
(FYI, besides knem, I'm just excluding the tcp btl, binding to core, and
reporting the bindings, which is exactly what I do on 1.6.5,
although the runtime parameter syntax has changed.)
- Is knem inadvertently not being activated at runtime in OMPI 1.8.3?
(i.e. a bug)
- Is there a way to increase verbosity to detect if knem is being
used by OMPI?
That would certainly help to check what is going on.
I tried '-mca btl_base_verbose 30' but there was no trace of knem
in sderr/stdout of either 1.6.5 or 1.8.3.
So, the evidence I have that knem is
active in 1.6.5 but not in 1.8.3 comes only from the statistics in
/dev/knem.
***
Thank you,
Gus Correa
***
PS - As an aside, I also have some questions on the knem setup,
which I mostly copied from the knem web site
(hopefully Brice Goglin is listening ...):
- Is 32768 in 'btl_sm_eager_limit 32768' a good number,
or should it be larger/smaller/something else?
[OK, I know I should benchmark it, but exploring the whole parameter
space takes long, so why not asking? ]
- Is it worth using 'btl_sm_knem_dma_min 1048576'?
[I think I read somewhere that this dma engine offload
is an Intel thing, not AMD.]
- How about btl_sm_knem_max_simultaneous?
That one is not mentioned in the knem web site.
Should I leave it default to zero or set it to 1? 2? 4? Something else?
Thanks again,
Gus Correa
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/10/25511.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/10/25512.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/10/25513.php