Thank you for this clear explanation. I do not have True Scale on 'my' machine, so unless Mellanox gets involved - no juice for me.

Makes me wonder. libfabric is marketed as a next-generation solution. Clearly it has some reported advantage for Cisco usnic, but since you claim no improvement over psm, then I guess it is nothing to look forward to, is it?

Anyway, thanks a lot for clearing this up

Marcin


On 09/30/2015 08:13 PM, Howard Pritchard wrote:
Hi Marcin,


2015-09-30 9:19 GMT-06:00 marcin.krotkiewski <marcin.krotkiew...@gmail.com <mailto:marcin.krotkiew...@gmail.com>>:

    Thank you, and Jeff, for clarification.

    Before I bother you all more without the need, I should probably
    say I was hoping to use libfabric/OpenMPI on an InfiniBand
    cluster. Somehow now I feel I have confused this altogether, so
    maybe I should go one step back:

     1. libfabric is hardware independent, and does support
    Infiniband, right?


The short answer is yes libfabric is hardware independent (and does work on goods days on os-x as well as linux). The longer answer is that there has been more/less work on implementing providers (the plugins in to libfabric
to interface to different networks) for different networks.

There is a socket provider. That gets a good amount of attention because its a base reference provider. psm/psm2 providers are available. I have used the psm provider some on a truescale cluster. It doesn't offer better performance than just using psm directly, but it does appear to work.

There is an mxm provider but it was not implemented by mellanox, and I can't get it to compile on my
connectx3 system using mxm 1.5.

There is a vanilla verbs provider but it doesn't support FI_EP_RDM endpoint type, which is used by
the non-cisco component of Open MPI (ofi mtl) which is available.

When you build and install libfabric, there should be an fi_info binary installed in $(LIBFABRIC_INSTALL_DIR)/bin
On my truescale cluster the output is:

psm: psm

    version: 0.9

    type: FI_EP_RDM

    protocol: FI_PROTO_PSMX

verbs: IB-0x80fe

    version: 1.0

    type: FI_EP_MSG

    protocol: FI_PROTO_RDMA_CM_IB_RC

sockets: IP

    version: 1.0

    type: FI_EP_MSG

    protocol: FI_PROTO_SOCK_TCP

sockets: IP

    version: 1.0

    type: FI_EP_DGRAM

    protocol: FI_PROTO_SOCK_TCP

sockets: IP

    version: 1.0

    type: FI_EP_RDM

    protocol: FI_PROTO_SOCK_TCP

In order to use the mtl/ofi, at a minimum a provider needs to support FI_EP_RDM type (see above). Note that on the truescale cluster the verbs provider is built, but it only supports FI_EP_MSG endpoint types. So mtl/ofi can't use that.

     2. I read that OpenMPI provides interface to libfabric through
    btl/usnic and mtl/ofi.  can any of those use libfabric on
    Infiniband networks?


if you have intel truescale or its follow-on then the answer is yes, although the default is for Open MPI to use mtl/psm on that network.


    Please forgive my ignorance, the amount of different options is
    rather overwhelming..

    Marcin



    On 09/30/2015 04:26 PM, Howard Pritchard wrote:

    Hello Marcin

    What configure options are you using besides with-libfabric?

    Could you post your config.log file tp the list?

    Looks like you only install fi_ext_usnic.h if you could build the
    usnic libfab provider.  When you configured libfabric what
    providers were listed at the end of configure run? Maybe attach
    config.log from the libfabric build ?

    If your cluster has cisco usnics you should probably be using
    libfabric/cisco openmpi. If you are using intel omnipath you may
    want to try the ofi mtl.  Its not selected by default however.

    Howard

    ----------

    sent from my smart phonr so no good type.

    Howard

    On Sep 30, 2015 5:35 AM, "Marcin Krotkiewski"
    <marcin.krotkiew...@gmail.com
    <mailto:marcin.krotkiew...@gmail.com>> wrote:

        Hi,

        I am trying to compile the 2.x branch with libfabric support,
        but get this error during configure:

        configure:100708: checking rdma/fi_ext_usnic.h presence
        configure:100708: gcc -E
        -I/cluster/software/VERSIONS/openmpi.gnu.2.x/include
        
-I/usit/abel/u1/marcink/software/ompi-release-2.x/opal/mca/hwloc/hwloc1110/hwloc/include
        conftest.c
        conftest.c:688:31: fatal error: rdma/fi_ext_usnic.h: No such
        file or directory
        [...]
        configure:100708: checking for rdma/fi_ext_usnic.h
        configure:100708: result: no
        configure:101253: checking if MCA component btl:usnic can compile
        configure:101255: result: no

        Which is correct - the file is not there. I have downloaded
        fresh libfabric-1.1.0.tar.bz2 and it does not have this file.
        Probably OpenMPI needs some updates?

        I am also wondering what is the state of libfabric support in
        OpenMPI nowadays. I have seen recent (March) presentation
        about it, so it seems to be an actively developed feature. Is
        this correct? It seemed from the presentation that there are
        benefits to this approach, but is it mature enough in
        OpenMPI, or it will yet take some time?

        Thanks!

        Marcin
        _______________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
        Link to this post:
        http://www.open-mpi.org/community/lists/users/2015/09/27728.php



    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
    Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/09/27733.php


    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
    Link to this post:
    http://www.open-mpi.org/community/lists/users/2015/09/27743.php




_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/09/27750.php

Reply via email to