Hi Gilles,
Thank you for pointing out my error on *-N*.
And you are right that I opened opensmd service before so the link up can
be set up correctly. But many IB-related command cannot be executed
correctly, like ibhosts and ibdiagnet.
As for pml, I am pretty sure I was using ob1, because ompi_info shows there
was no ucx or mxm and ob1 has highest priority.

Best regards,
Xie Bin

Gilles Gouaillardet <gil...@rist.or.jp> 于 2018年5月15日 周二 10:09写道:

> Xie Bin,
>
>
> According to the man page, -N is equivalent to npernode, which is
> equivalent to --map-by ppr:N:node.
>
> This is *not* equivalent to -map-by node :
>
> The former packs tasks to the same node, and the latter scatters tasks
> accross the nodes
>
>
> [gilles@login ~]$ mpirun --host n0:2,n1:2 -N 2 --tag-output hostname |
> sort
> [1,0]<stdout>:n0
> [1,1]<stdout>:n0
> [1,2]<stdout>:n1
> [1,3]<stdout>:n1
>
>
> [gilles@login ~]$ mpirun --host n0:2,n1:2 -np 4 --tag-output -map-by
> node hostname | sort
> [1,0]<stdout>:n0
> [1,1]<stdout>:n1
> [1,2]<stdout>:n0
> [1,3]<stdout>:n1
>
>
> I am pretty sure a subnet manager was ran at some point in time (so your
> HCA can get their identifier).
>
> /* feel free to reboot your nodes and see if ibstat still shows the
> adapters as active */
>
>
> Note you might also use --mca pml ob1 in order to make sure mxm nor ucx
> are used
>
>
> Cheers,
>
>
> Gilles
>
>
>
> On 5/15/2018 10:45 AM, Blade Shieh wrote:
> > Hi, George:
> > My command lines are:
> > 1) single node
> > mpirun --allow-run-as-root -mca btl self,tcp(or openib) -mca
> > btl_tcp_if_include eth2 -mca btl_openib_if_include mlx5_0 -x
> > OMP_NUM_THREADS=2 -n 32 myapp
> > 2) 2-node cluster
> > mpirun --allow-run-as-root -mca btl ^tcp(or ^openib) -mca
> > btl_tcp_if_include eth2 -mca btl_openib_if_include mlx5_0 -x
> > OMP_NUM_THREADS=4 -N 16 myapp
> >
> > In 2nd condition, I used -N, which is equal to --map-by node.
> >
> > Best regards,
> > Xie Bin
> >
> >
> > George Bosilca <bosi...@icl.utk.edu <mailto:bosi...@icl.utk.edu>> 于
> > 2018年5月15日 周二 02:07写道:
> >
> >     Shared memory communication is important for multi-core platforms,
> >     especially when you have multiple processes per node. But this is
> >     only part of your issue here.
> >
> >     You haven't specified how your processes will be mapped on your
> >     resources. As a result rank 0 and 1 will be on the same node, so
> >     you are testing the shared memory support of whatever BTL you
> >     allow. In this case the performance will be much better for TCP
> >     than for IB, simply because you are not using your network, but
> >     its capacity to move data across memory banks. In such an
> >     environment, TCP translated to a memcpy plus a system call, which
> >     is much faster than IB. That being said, it should not matter
> >     because shared memory is there to cover this case.
> >
> >     Add "--map-by node" to your mpirun command to measure the
> >     bandwidth between nodes.
> >
> >       George.
> >
> >
> >
> >     On Mon, May 14, 2018 at 5:04 AM, Blade Shieh <bladesh...@gmail.com
> >     <mailto:bladesh...@gmail.com>> wrote:
> >
> >
> >         Hi, Nathan:
> >             Thanks for you reply.
> >         1) It was my mistake not to notice usage of osu_latency. Now
> >         it worked well, but still poorer in openib.
> >         2) I did not use sm or vader because I wanted to check
> >         performance between tcp and openib. Besides, I will run the
> >         application in cluster, so vader is not so important.
> >         3) Of course, I tried you suggestions. I used ^tcp/^openib and
> >         set btl_openib_if_include to mlx5_0 in a two-node cluster (IB
> >         direcet-connected). The result did not change -- IB still
> >         better in MPI benchmark but poorer in my applicaion.
> >
> >         Best Regards,
> >         Xie Bin
> >
> >         _______________________________________________
> >         users mailing list
> >         users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> >         https://lists.open-mpi.org/mailman/listinfo/users
> >
> >
> >     _______________________________________________
> >     users mailing list
> >     users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> >     https://lists.open-mpi.org/mailman/listinfo/users
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to