I see several problems

1) osu_latency only works with two procs.

2) You explicitly excluded shared memory support by specifying only self and 
openib (or tcp). If you want to just disable tcp or openib use —mca btl ^tcp or 
—mca btl ^openib

Also, it looks like you have multiple ports active that are on different 
subnets. You can use —mca btl_openib_if_include to set it to use a specific 
device or devices (i.e. mlx5_0).

See this warning:

--------------------------------------------------------------------------
WARNING: There are more than one active ports on host 'localhost', but the
default subnet GID prefix was detected on more than one of these
ports.  If these ports are connected to different physical IB
networks, this configuration will fail in Open MPI.  This version of
Open MPI requires that every physically separate IB subnet that is
used between connected MPI processes must have different subnet ID
values.

Please see this FAQ entry for more details:

  http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_default_gid_prefix to 0.
--------------------------------------------------------------------------


-Nathan

> On May 13, 2018, at 7:44 PM, Blade Shieh <bladesh...@gmail.com> wrote:
> 
> 
> /********** The problem ***********/
> 
> I have a cluster with 10GE ethernet and 100Gb infiniband. While running my 
> application - CAMx, I found that the performance with IB is not as good as 
> ethernet. That is confusing because IB latency and bandwith is undoubtablely 
> better than ethernet, which is proven by MPI benchmark IMB-MPI1 and osu.
> 
> 
> 
> /********** software stack ***********/
> 
> centos7.4 with kernel 4.11.0-45.6.1.el7a.aarch64
> 
> MLNX_OFED_LINUX-4.3-1.0.1.0 from 
> http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers
> 
> gnu7.3 from OpenHPC release.       yun install 
> gnu7-compilers-ohpc-7.3.0-43.1.aarch64
> 
> openmpi3 from OpenHPC release.  yum install 
> openmpi3-gnu7-ohpc-3.0.0-36.4.aarch64
> 
> CAMx 6.4.0 from http://www.camx.com/
> 
> IMB from https://github.com/intel/mpi-benchmarks
> 
> OSU from http://mvapich.cse.ohio-state.edu/benchmarks/
> 
> 
> 
> 
> 
> /********** command lines are ********/
> 
> 
> 
> (time mpirun --allow-run-as-root -mca btl self,openib  -x OMP_NUM_THREADS=2 
> -n 32 -mca btl_tcp_if_include eth2 
> ../../src/CAMx.v6.40.openMPI.gfortranomp.ompi) > camx_openib_log 2>&1
> 
> (time mpirun --allow-run-as-root -mca btl self,tcp  -x OMP_NUM_THREADS=2 -n 
> 32 -mca btl_tcp_if_include eth2 
> ../../src/CAMx.v6.40.openMPI.gfortranomp.ompi) > camx_tcp_log 2>&1
> 
> 
> 
> (time mpirun --allow-run-as-root -mca btl self,openib  -x OMP_NUM_THREADS=2 
> -n 32 -mca btl_tcp_if_include eth2 IMB-MPI1 allreduce -msglog 8 -npmin 1000) 
> > IMB_openib_log 2>&1
> 
> (time mpirun --allow-run-as-root -mca btl self,tcp  -x OMP_NUM_THREADS=2 -n 
> 32 -mca btl_tcp_if_include eth2 IMB-MPI1 allreduce -msglog 8 -npmin 1000) > 
> IMB_tcp_log 2>&1
> 
> 
> 
> (time mpirun --allow-run-as-root -mca btl self,openib  -x OMP_NUM_THREADS=2 
> -n 32 -mca btl_tcp_if_include eth2 osu_latency) > osu_openib_log 2>&1
> 
> (time mpirun --allow-run-as-root -mca btl self,tcp  -x OMP_NUM_THREADS=2 -n 
> 32 -mca btl_tcp_if_include eth2 osu_latency) > osu_tcp_log 2>&1
> 
> 
> 
> /********** about openmpi and network config *************/
> 
> 
> 
> Please refer to relevant log files in the attachment.
> 
> 
> 
> Best Regards,
> 
> Xie Bin
> 
> <ompi_support.tar.bz2>_______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

Attachment: signature.asc
Description: Message signed with OpenPGP

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to