Hi, We appear to have a correctly setup Mellanox IB network (ibdiagnet, ibstat, iblinkinfo, ibqueryerrors(*)). It's operating at Rate 40 FDR10.
But openMPI programs (test and user) that are specifying the 'openib,self,sm' paramenters do not seem to be using the IB network according to network- monitoring tools (dstat/tcpdump/ifconfig counters). ib0 is the interface to the IB network, em1 is our general network. It's just a plain CentOS 6.5 system with openMPI 1.8.1. Is anyone able to advise if this is normal behaviour ? How can I explictly verify user and tests programs are using the IB network ? I've done alot of google and FAQ searching, but my case does not seem to come up in either. (1) no traffic on ib0 at all /usr/local/openmpi-1.8.1/bin/mpirun --mca btl_openib_verbose 1 \ --mca btl openib,self,sm -host max140,max141 -n 8 \ /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 Monitoring with below shows no traffic at all on ib0: dstat -n -N ib0,em1,total --net/ib0-----net/em1----net/total- recv send: recv send: recv send 0 0 : 0 0 : 0 0 0 0 : 864B 0 : 864B 0 0 0 : 452B 832B: 452B 832B 0 0 :3554B 230B:3554B 230B Monitoring with the below is showing no traffic at all on ib0: sudo tcpdump -i ib0 (2) Roughly shared network usage between ib0 and em1 /usr/local/openmpi-1.8.1/bin/mpirun --mca btl_openib_verbose 1 \ --mca btl tcp,vader,self -host max140,max141 -n 8 \ /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 --net/ib0-----net/em1----net/total- recv send: recv send: recv send 0 0 : 0 0 : 0 0 1061k 1129k: 97M 96M: 98M 97M 46M 45M:9356k 11M: 56M 56M 84M 82M: 12M 12M: 96M 93M 160k 167k: 82M 82M: 82M 82M tcpdump -i ib0 # shows lots of network traffic ifconfig ib0 # shows increasing packet counters (3) mostly uses ib0, "feels" fast /usr/local/openmpi-1.8.1/bin/mpirun --mca btl_openib_verbose 1 \ --mca btl tcp,vader,self --mca btl_tcp_if_include ib0 -host max140,max141 -n 8 /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 [r...@max140.mdc-berlin.net:/home/darnold] $ dstat -n -N ib0,em1,total --net/ib0-----net/em1----net/total- recv send: recv send: recv send 0 0 : 0 0 : 0 0 506k 538k:2472B 3880B: 508k 542k 371M 362M:5628B 11k: 371M 362M 1000M 972M:8517B 5328B:1000M 972M 62M 63M:1248B 1424B: 62M 63M tcpdump -i ib0 # shows lots of network traffic ifconfig ib0 # shows increasing packet counters