David,

ib0 means IP over IB
this is *not* what you want to use since it is way slower than native
infiniband.
if you
mpirun --mca self,sm,openib ...
on more than one node, the only btl usable for inter node communication is
openib,
so if communication happen, that means opening is used.

in order to monitor native infiniband traffic, you can use
perfquery -x
you will see some traffic with this

Cheers,

Gilles

On Friday, October 16, 2015, David Arnold <darno...@gmail.com> wrote:

> Hi,
>
> We appear to have a correctly setup Mellanox IB network (ibdiagnet, ibstat,
> iblinkinfo, ibqueryerrors(*)).  It's operating at Rate 40 FDR10.
>
> But openMPI programs (test and user) that are specifying the
> 'openib,self,sm'
> paramenters do not seem to be using the IB network according to network-
> monitoring tools (dstat/tcpdump/ifconfig counters).
>
> ib0 is the interface to the IB network, em1 is our general network.  It's
> just a plain CentOS 6.5 system with openMPI 1.8.1.
>
> Is anyone able to advise if this is normal behaviour ?
> How can I explictly verify user and tests programs are using the IB
> network ?
>
> I've done alot of google and FAQ searching, but my case does not seem to
> come up in either.
>
> (1) no traffic on ib0 at all
> /usr/local/openmpi-1.8.1/bin/mpirun --mca btl_openib_verbose 1 \
> --mca btl openib,self,sm -host max140,max141 -n 8 \
> /usr/lib64/openmpi/bin/mpitests-IMB-MPI1
>
> Monitoring with below shows no traffic at all on ib0:
> dstat -n -N ib0,em1,total
>
> --net/ib0-----net/em1----net/total-
>  recv  send: recv  send: recv  send
>    0     0 :   0     0 :   0     0
>    0     0 : 864B    0 : 864B    0
>    0     0 : 452B  832B: 452B  832B
>    0     0 :3554B  230B:3554B  230B
>
> Monitoring with the below is showing no traffic at all on ib0:
> sudo tcpdump -i ib0
>
> (2) Roughly shared network usage between ib0 and em1
> /usr/local/openmpi-1.8.1/bin/mpirun --mca btl_openib_verbose 1 \
> --mca btl tcp,vader,self -host max140,max141 -n 8 \
> /usr/lib64/openmpi/bin/mpitests-IMB-MPI1
>
> --net/ib0-----net/em1----net/total-
>  recv  send: recv  send: recv  send
>    0     0 :   0     0 :   0     0
> 1061k 1129k:  97M   96M:  98M   97M
>   46M   45M:9356k   11M:  56M   56M
>   84M   82M:  12M   12M:  96M   93M
>  160k  167k:  82M   82M:  82M   82M
>
> tcpdump -i ib0 # shows lots of network traffic
> ifconfig ib0 # shows increasing packet counters
>
> (3) mostly uses ib0, "feels" fast
> /usr/local/openmpi-1.8.1/bin/mpirun --mca btl_openib_verbose 1 \
> --mca btl tcp,vader,self --mca btl_tcp_if_include ib0 -host max140,max141
> -n 8
> /usr/lib64/openmpi/bin/mpitests-IMB-MPI1
>
> [r...@max140.mdc-berlin.net:/home/darnold] $ dstat -n -N ib0,em1,total
> --net/ib0-----net/em1----net/total-
>  recv  send: recv  send: recv  send
>    0     0 :   0     0 :   0     0
>  506k  538k:2472B 3880B: 508k  542k
>  371M  362M:5628B   11k: 371M  362M
> 1000M  972M:8517B 5328B:1000M  972M
>   62M   63M:1248B 1424B:  62M   63M
>
> tcpdump -i ib0 # shows lots of network traffic
> ifconfig ib0 # shows increasing packet counters
>
>

Reply via email to