David, ib0 means IP over IB this is *not* what you want to use since it is way slower than native infiniband. if you mpirun --mca self,sm,openib ... on more than one node, the only btl usable for inter node communication is openib, so if communication happen, that means opening is used.
in order to monitor native infiniband traffic, you can use perfquery -x you will see some traffic with this Cheers, Gilles On Friday, October 16, 2015, David Arnold <darno...@gmail.com> wrote: > Hi, > > We appear to have a correctly setup Mellanox IB network (ibdiagnet, ibstat, > iblinkinfo, ibqueryerrors(*)). It's operating at Rate 40 FDR10. > > But openMPI programs (test and user) that are specifying the > 'openib,self,sm' > paramenters do not seem to be using the IB network according to network- > monitoring tools (dstat/tcpdump/ifconfig counters). > > ib0 is the interface to the IB network, em1 is our general network. It's > just a plain CentOS 6.5 system with openMPI 1.8.1. > > Is anyone able to advise if this is normal behaviour ? > How can I explictly verify user and tests programs are using the IB > network ? > > I've done alot of google and FAQ searching, but my case does not seem to > come up in either. > > (1) no traffic on ib0 at all > /usr/local/openmpi-1.8.1/bin/mpirun --mca btl_openib_verbose 1 \ > --mca btl openib,self,sm -host max140,max141 -n 8 \ > /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 > > Monitoring with below shows no traffic at all on ib0: > dstat -n -N ib0,em1,total > > --net/ib0-----net/em1----net/total- > recv send: recv send: recv send > 0 0 : 0 0 : 0 0 > 0 0 : 864B 0 : 864B 0 > 0 0 : 452B 832B: 452B 832B > 0 0 :3554B 230B:3554B 230B > > Monitoring with the below is showing no traffic at all on ib0: > sudo tcpdump -i ib0 > > (2) Roughly shared network usage between ib0 and em1 > /usr/local/openmpi-1.8.1/bin/mpirun --mca btl_openib_verbose 1 \ > --mca btl tcp,vader,self -host max140,max141 -n 8 \ > /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 > > --net/ib0-----net/em1----net/total- > recv send: recv send: recv send > 0 0 : 0 0 : 0 0 > 1061k 1129k: 97M 96M: 98M 97M > 46M 45M:9356k 11M: 56M 56M > 84M 82M: 12M 12M: 96M 93M > 160k 167k: 82M 82M: 82M 82M > > tcpdump -i ib0 # shows lots of network traffic > ifconfig ib0 # shows increasing packet counters > > (3) mostly uses ib0, "feels" fast > /usr/local/openmpi-1.8.1/bin/mpirun --mca btl_openib_verbose 1 \ > --mca btl tcp,vader,self --mca btl_tcp_if_include ib0 -host max140,max141 > -n 8 > /usr/lib64/openmpi/bin/mpitests-IMB-MPI1 > > [r...@max140.mdc-berlin.net:/home/darnold] $ dstat -n -N ib0,em1,total > --net/ib0-----net/em1----net/total- > recv send: recv send: recv send > 0 0 : 0 0 : 0 0 > 506k 538k:2472B 3880B: 508k 542k > 371M 362M:5628B 11k: 371M 362M > 1000M 972M:8517B 5328B:1000M 972M > 62M 63M:1248B 1424B: 62M 63M > > tcpdump -i ib0 # shows lots of network traffic > ifconfig ib0 # shows increasing packet counters > >