Thank you for this clear explanation. I do not have True Scale on 'my'
machine, so unless Mellanox gets involved - no juice for me.
Makes me wonder. libfabric is marketed as a next-generation solution.
Clearly it has some reported advantage for Cisco usnic, but since you
claim no improvement over psm, then I guess it is nothing to look
forward to, is it?
Anyway, thanks a lot for clearing this up
Marcin
On 09/30/2015 08:13 PM, Howard Pritchard wrote:
Hi Marcin,
2015-09-30 9:19 GMT-06:00 marcin.krotkiewski
<marcin.krotkiew...@gmail.com <mailto:marcin.krotkiew...@gmail.com>>:
Thank you, and Jeff, for clarification.
Before I bother you all more without the need, I should probably
say I was hoping to use libfabric/OpenMPI on an InfiniBand
cluster. Somehow now I feel I have confused this altogether, so
maybe I should go one step back:
1. libfabric is hardware independent, and does support
Infiniband, right?
The short answer is yes libfabric is hardware independent (and does
work on goods days on os-x as well as linux).
The longer answer is that there has been more/less work on
implementing providers (the plugins in to libfabric
to interface to different networks) for different networks.
There is a socket provider. That gets a good amount of attention
because its a base reference provider.
psm/psm2 providers are available. I have used the psm provider some
on a truescale cluster. It doesn't
offer better performance than just using psm directly, but it does
appear to work.
There is an mxm provider but it was not implemented by mellanox, and I
can't get it to compile on my
connectx3 system using mxm 1.5.
There is a vanilla verbs provider but it doesn't support FI_EP_RDM
endpoint type, which is used by
the non-cisco component of Open MPI (ofi mtl) which is available.
When you build and install libfabric, there should be an fi_info
binary installed in $(LIBFABRIC_INSTALL_DIR)/bin
On my truescale cluster the output is:
psm: psm
version: 0.9
type: FI_EP_RDM
protocol: FI_PROTO_PSMX
verbs: IB-0x80fe
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_RC
sockets: IP
version: 1.0
type: FI_EP_MSG
protocol: FI_PROTO_SOCK_TCP
sockets: IP
version: 1.0
type: FI_EP_DGRAM
protocol: FI_PROTO_SOCK_TCP
sockets: IP
version: 1.0
type: FI_EP_RDM
protocol: FI_PROTO_SOCK_TCP
In order to use the mtl/ofi, at a minimum a provider needs to support
FI_EP_RDM type (see above). Note that on the truescale
cluster the verbs provider is built, but it only supports FI_EP_MSG
endpoint types. So mtl/ofi can't use that.
2. I read that OpenMPI provides interface to libfabric through
btl/usnic and mtl/ofi. can any of those use libfabric on
Infiniband networks?
if you have intel truescale or its follow-on then the answer is yes,
although the default is for Open MPI to use mtl/psm on that network.
Please forgive my ignorance, the amount of different options is
rather overwhelming..
Marcin
On 09/30/2015 04:26 PM, Howard Pritchard wrote:
Hello Marcin
What configure options are you using besides with-libfabric?
Could you post your config.log file tp the list?
Looks like you only install fi_ext_usnic.h if you could build the
usnic libfab provider. When you configured libfabric what
providers were listed at the end of configure run? Maybe attach
config.log from the libfabric build ?
If your cluster has cisco usnics you should probably be using
libfabric/cisco openmpi. If you are using intel omnipath you may
want to try the ofi mtl. Its not selected by default however.
Howard
----------
sent from my smart phonr so no good type.
Howard
On Sep 30, 2015 5:35 AM, "Marcin Krotkiewski"
<marcin.krotkiew...@gmail.com
<mailto:marcin.krotkiew...@gmail.com>> wrote:
Hi,
I am trying to compile the 2.x branch with libfabric support,
but get this error during configure:
configure:100708: checking rdma/fi_ext_usnic.h presence
configure:100708: gcc -E
-I/cluster/software/VERSIONS/openmpi.gnu.2.x/include
-I/usit/abel/u1/marcink/software/ompi-release-2.x/opal/mca/hwloc/hwloc1110/hwloc/include
conftest.c
conftest.c:688:31: fatal error: rdma/fi_ext_usnic.h: No such
file or directory
[...]
configure:100708: checking for rdma/fi_ext_usnic.h
configure:100708: result: no
configure:101253: checking if MCA component btl:usnic can compile
configure:101255: result: no
Which is correct - the file is not there. I have downloaded
fresh libfabric-1.1.0.tar.bz2 and it does not have this file.
Probably OpenMPI needs some updates?
I am also wondering what is the state of libfabric support in
OpenMPI nowadays. I have seen recent (March) presentation
about it, so it seems to be an actively developed feature. Is
this correct? It seemed from the presentation that there are
benefits to this approach, but is it mature enough in
OpenMPI, or it will yet take some time?
Thanks!
Marcin
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/09/27728.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:http://www.open-mpi.org/community/lists/users/2015/09/27733.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/09/27743.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/09/27750.php