I see several problems 1) osu_latency only works with two procs.
2) You explicitly excluded shared memory support by specifying only self and openib (or tcp). If you want to just disable tcp or openib use —mca btl ^tcp or —mca btl ^openib Also, it looks like you have multiple ports active that are on different subnets. You can use —mca btl_openib_if_include to set it to use a specific device or devices (i.e. mlx5_0). See this warning: -------------------------------------------------------------------------- WARNING: There are more than one active ports on host 'localhost', but the default subnet GID prefix was detected on more than one of these ports. If these ports are connected to different physical IB networks, this configuration will fail in Open MPI. This version of Open MPI requires that every physically separate IB subnet that is used between connected MPI processes must have different subnet ID values. Please see this FAQ entry for more details: http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid NOTE: You can turn off this warning by setting the MCA parameter btl_openib_warn_default_gid_prefix to 0. -------------------------------------------------------------------------- -Nathan > On May 13, 2018, at 7:44 PM, Blade Shieh <bladesh...@gmail.com> wrote: > > > /********** The problem ***********/ > > I have a cluster with 10GE ethernet and 100Gb infiniband. While running my > application - CAMx, I found that the performance with IB is not as good as > ethernet. That is confusing because IB latency and bandwith is undoubtablely > better than ethernet, which is proven by MPI benchmark IMB-MPI1 and osu. > > > > /********** software stack ***********/ > > centos7.4 with kernel 4.11.0-45.6.1.el7a.aarch64 > > MLNX_OFED_LINUX-4.3-1.0.1.0 from > http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers > > gnu7.3 from OpenHPC release. yun install > gnu7-compilers-ohpc-7.3.0-43.1.aarch64 > > openmpi3 from OpenHPC release. yum install > openmpi3-gnu7-ohpc-3.0.0-36.4.aarch64 > > CAMx 6.4.0 from http://www.camx.com/ > > IMB from https://github.com/intel/mpi-benchmarks > > OSU from http://mvapich.cse.ohio-state.edu/benchmarks/ > > > > > > /********** command lines are ********/ > > > > (time mpirun --allow-run-as-root -mca btl self,openib -x OMP_NUM_THREADS=2 > -n 32 -mca btl_tcp_if_include eth2 > ../../src/CAMx.v6.40.openMPI.gfortranomp.ompi) > camx_openib_log 2>&1 > > (time mpirun --allow-run-as-root -mca btl self,tcp -x OMP_NUM_THREADS=2 -n > 32 -mca btl_tcp_if_include eth2 > ../../src/CAMx.v6.40.openMPI.gfortranomp.ompi) > camx_tcp_log 2>&1 > > > > (time mpirun --allow-run-as-root -mca btl self,openib -x OMP_NUM_THREADS=2 > -n 32 -mca btl_tcp_if_include eth2 IMB-MPI1 allreduce -msglog 8 -npmin 1000) > > IMB_openib_log 2>&1 > > (time mpirun --allow-run-as-root -mca btl self,tcp -x OMP_NUM_THREADS=2 -n > 32 -mca btl_tcp_if_include eth2 IMB-MPI1 allreduce -msglog 8 -npmin 1000) > > IMB_tcp_log 2>&1 > > > > (time mpirun --allow-run-as-root -mca btl self,openib -x OMP_NUM_THREADS=2 > -n 32 -mca btl_tcp_if_include eth2 osu_latency) > osu_openib_log 2>&1 > > (time mpirun --allow-run-as-root -mca btl self,tcp -x OMP_NUM_THREADS=2 -n > 32 -mca btl_tcp_if_include eth2 osu_latency) > osu_tcp_log 2>&1 > > > > /********** about openmpi and network config *************/ > > > > Please refer to relevant log files in the attachment. > > > > Best Regards, > > Xie Bin > > <ompi_support.tar.bz2>_______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users
signature.asc
Description: Message signed with OpenPGP
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users