Shared memory communication is important for multi-core platforms, especially when you have multiple processes per node. But this is only part of your issue here.
You haven't specified how your processes will be mapped on your resources. As a result rank 0 and 1 will be on the same node, so you are testing the shared memory support of whatever BTL you allow. In this case the performance will be much better for TCP than for IB, simply because you are not using your network, but its capacity to move data across memory banks. In such an environment, TCP translated to a memcpy plus a system call, which is much faster than IB. That being said, it should not matter because shared memory is there to cover this case. Add "--map-by node" to your mpirun command to measure the bandwidth between nodes. George. On Mon, May 14, 2018 at 5:04 AM, Blade Shieh <bladesh...@gmail.com> wrote: > > Hi, Nathan: > Thanks for you reply. > 1) It was my mistake not to notice usage of osu_latency. Now it worked > well, but still poorer in openib. > 2) I did not use sm or vader because I wanted to check performance between > tcp and openib. Besides, I will run the application in cluster, so vader is > not so important. > 3) Of course, I tried you suggestions. I used ^tcp/^openib and set > btl_openib_if_include to mlx5_0 in a two-node cluster (IB > direcet-connected). The result did not change -- IB still better in MPI > benchmark but poorer in my applicaion. > > Best Regards, > Xie Bin > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users