Hi Gilles, Thank you for pointing out my error on *-N*. And you are right that I opened opensmd service before so the link up can be set up correctly. But many IB-related command cannot be executed correctly, like ibhosts and ibdiagnet. As for pml, I am pretty sure I was using ob1, because ompi_info shows there was no ucx or mxm and ob1 has highest priority.
Best regards, Xie Bin Gilles Gouaillardet <gil...@rist.or.jp> 于 2018年5月15日 周二 10:09写道: > Xie Bin, > > > According to the man page, -N is equivalent to npernode, which is > equivalent to --map-by ppr:N:node. > > This is *not* equivalent to -map-by node : > > The former packs tasks to the same node, and the latter scatters tasks > accross the nodes > > > [gilles@login ~]$ mpirun --host n0:2,n1:2 -N 2 --tag-output hostname | > sort > [1,0]<stdout>:n0 > [1,1]<stdout>:n0 > [1,2]<stdout>:n1 > [1,3]<stdout>:n1 > > > [gilles@login ~]$ mpirun --host n0:2,n1:2 -np 4 --tag-output -map-by > node hostname | sort > [1,0]<stdout>:n0 > [1,1]<stdout>:n1 > [1,2]<stdout>:n0 > [1,3]<stdout>:n1 > > > I am pretty sure a subnet manager was ran at some point in time (so your > HCA can get their identifier). > > /* feel free to reboot your nodes and see if ibstat still shows the > adapters as active */ > > > Note you might also use --mca pml ob1 in order to make sure mxm nor ucx > are used > > > Cheers, > > > Gilles > > > > On 5/15/2018 10:45 AM, Blade Shieh wrote: > > Hi, George: > > My command lines are: > > 1) single node > > mpirun --allow-run-as-root -mca btl self,tcp(or openib) -mca > > btl_tcp_if_include eth2 -mca btl_openib_if_include mlx5_0 -x > > OMP_NUM_THREADS=2 -n 32 myapp > > 2) 2-node cluster > > mpirun --allow-run-as-root -mca btl ^tcp(or ^openib) -mca > > btl_tcp_if_include eth2 -mca btl_openib_if_include mlx5_0 -x > > OMP_NUM_THREADS=4 -N 16 myapp > > > > In 2nd condition, I used -N, which is equal to --map-by node. > > > > Best regards, > > Xie Bin > > > > > > George Bosilca <bosi...@icl.utk.edu <mailto:bosi...@icl.utk.edu>> 于 > > 2018年5月15日 周二 02:07写道: > > > > Shared memory communication is important for multi-core platforms, > > especially when you have multiple processes per node. But this is > > only part of your issue here. > > > > You haven't specified how your processes will be mapped on your > > resources. As a result rank 0 and 1 will be on the same node, so > > you are testing the shared memory support of whatever BTL you > > allow. In this case the performance will be much better for TCP > > than for IB, simply because you are not using your network, but > > its capacity to move data across memory banks. In such an > > environment, TCP translated to a memcpy plus a system call, which > > is much faster than IB. That being said, it should not matter > > because shared memory is there to cover this case. > > > > Add "--map-by node" to your mpirun command to measure the > > bandwidth between nodes. > > > > George. > > > > > > > > On Mon, May 14, 2018 at 5:04 AM, Blade Shieh <bladesh...@gmail.com > > <mailto:bladesh...@gmail.com>> wrote: > > > > > > Hi, Nathan: > > Thanks for you reply. > > 1) It was my mistake not to notice usage of osu_latency. Now > > it worked well, but still poorer in openib. > > 2) I did not use sm or vader because I wanted to check > > performance between tcp and openib. Besides, I will run the > > application in cluster, so vader is not so important. > > 3) Of course, I tried you suggestions. I used ^tcp/^openib and > > set btl_openib_if_include to mlx5_0 in a two-node cluster (IB > > direcet-connected). The result did not change -- IB still > > better in MPI benchmark but poorer in my applicaion. > > > > Best Regards, > > Xie Bin > > > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > > https://lists.open-mpi.org/mailman/listinfo/users > > > > > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > > https://lists.open-mpi.org/mailman/listinfo/users > > > > > > > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users