Hi,

We appear to have a correctly setup Mellanox IB network (ibdiagnet, ibstat,
iblinkinfo, ibqueryerrors(*)).  It's operating at Rate 40 FDR10.

But openMPI programs (test and user) that are specifying the
'openib,self,sm'
paramenters do not seem to be using the IB network according to network-
monitoring tools (dstat/tcpdump/ifconfig counters).

ib0 is the interface to the IB network, em1 is our general network.  It's
just a plain CentOS 6.5 system with openMPI 1.8.1.

Is anyone able to advise if this is normal behaviour ?
How can I explictly verify user and tests programs are using the IB network
?

I've done alot of google and FAQ searching, but my case does not seem to
come up in either.

(1) no traffic on ib0 at all
/usr/local/openmpi-1.8.1/bin/mpirun --mca btl_openib_verbose 1 \
--mca btl openib,self,sm -host max140,max141 -n 8 \
/usr/lib64/openmpi/bin/mpitests-IMB-MPI1

Monitoring with below shows no traffic at all on ib0:
dstat -n -N ib0,em1,total

--net/ib0-----net/em1----net/total-
 recv  send: recv  send: recv  send
   0     0 :   0     0 :   0     0
   0     0 : 864B    0 : 864B    0
   0     0 : 452B  832B: 452B  832B
   0     0 :3554B  230B:3554B  230B

Monitoring with the below is showing no traffic at all on ib0:
sudo tcpdump -i ib0

(2) Roughly shared network usage between ib0 and em1
/usr/local/openmpi-1.8.1/bin/mpirun --mca btl_openib_verbose 1 \
--mca btl tcp,vader,self -host max140,max141 -n 8 \
/usr/lib64/openmpi/bin/mpitests-IMB-MPI1

--net/ib0-----net/em1----net/total-
 recv  send: recv  send: recv  send
   0     0 :   0     0 :   0     0
1061k 1129k:  97M   96M:  98M   97M
  46M   45M:9356k   11M:  56M   56M
  84M   82M:  12M   12M:  96M   93M
 160k  167k:  82M   82M:  82M   82M

tcpdump -i ib0 # shows lots of network traffic
ifconfig ib0 # shows increasing packet counters

(3) mostly uses ib0, "feels" fast
/usr/local/openmpi-1.8.1/bin/mpirun --mca btl_openib_verbose 1 \
--mca btl tcp,vader,self --mca btl_tcp_if_include ib0 -host max140,max141
-n 8
/usr/lib64/openmpi/bin/mpitests-IMB-MPI1

[r...@max140.mdc-berlin.net:/home/darnold] $ dstat -n -N ib0,em1,total
--net/ib0-----net/em1----net/total-
 recv  send: recv  send: recv  send
   0     0 :   0     0 :   0     0
 506k  538k:2472B 3880B: 508k  542k
 371M  362M:5628B   11k: 371M  362M
1000M  972M:8517B 5328B:1000M  972M
  62M   63M:1248B 1424B:  62M   63M

tcpdump -i ib0 # shows lots of network traffic
ifconfig ib0 # shows increasing packet counters

Reply via email to