The long story is you need always need a subnet manager to initialize the fabric.

That means you can run the subnet manager and stop it once so each HCA is assigned a LID.

In that case, the commands that interact with the SM (ibhosts, ibdiagnet) will obviously fail.


Cheers,


Gilles


On 5/15/2018 4:51 PM, John Hearns via users wrote:
Xie,   as far as I know you need to run OpenSM even on two hosts.

On 15 May 2018 at 03:29, Blade Shieh <bladesh...@gmail.com <mailto:bladesh...@gmail.com>> wrote:

    Hi, John:

    You are right on the network framework. I do have no IB switch and
    just connect the servers with an IB cable. I did not even open the
    opensmd service because it seems unnecessary in this situation.
    Can this be the reason why IB performs poorer?

    Interconnection details are in the attachment.

    Best Regards,

    Xie Bin



    John Hearns via users <users@lists.open-mpi.org
    <mailto:users@lists.open-mpi.org>> 于 2018年5月14日 周一 17:45写道:

        Xie Bin,  I do hate to ask this.  You say  "in a two-node
        cluster (IB direcet-connected). "
        Does that mean that you have no IB switch, and that there is a
        single IB cable joining up these two servers?
        If so please run:    ibstatus ibhosts   ibdiagnet
        I am trying to check if the IB fabric is functioning properly
        in that situation.
        (Also need to check if there is o Subnet Manager  - so run  
        sminfo)

        But you do say that the IMB test gives good results for IB, so
        you must have IB working properly.
        Therefore I am an idiot...



        On 14 May 2018 at 11:04, Blade Shieh <bladesh...@gmail.com
        <mailto:bladesh...@gmail.com>> wrote:


            Hi, Nathan:
                Thanks for you reply.
            1) It was my mistake not to notice usage of osu_latency.
            Now it worked well, but still poorer in openib.
            2) I did not use sm or vader because I wanted to check
            performance between tcp and openib. Besides, I will run
            the application in cluster, so vader is not so important.
            3) Of course, I tried you suggestions. I used ^tcp/^openib
            and set btl_openib_if_include to mlx5_0 in a two-node
            cluster (IB direcet-connected).  The result did not change
            -- IB still better in MPI benchmark but poorer in my
            applicaion.

            Best Regards,
            Xie Bin

            _______________________________________________
            users mailing list
            users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
            https://lists.open-mpi.org/mailman/listinfo/users
            <https://lists.open-mpi.org/mailman/listinfo/users>


        _______________________________________________
        users mailing list
        users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
        https://lists.open-mpi.org/mailman/listinfo/users
        <https://lists.open-mpi.org/mailman/listinfo/users>


    _______________________________________________
    users mailing list
    users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
    https://lists.open-mpi.org/mailman/listinfo/users
    <https://lists.open-mpi.org/mailman/listinfo/users>




_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to