Re: [OMPI users] tcp communication problems with 1.4.3 and 1.4.4 rc2 on FreeBSD

Jeff Squyres Fri, 8 Jul 2011 14:19:35 -0400

On Jul 8, 2011, at 1:31 PM, Steve Kargl wrote:

> It seems that openmpi-1.4.4 compiled code is trying to use the
> wrong nic.  My /etc/hosts file has
> 
> 10.208.78.111           hpc.apl.washington.edu hpc
> 192.168.0.10            node10.cimu.org node10 n10 master
> 192.168.0.11            node11.cimu.org node11 n11
> 192.168.0.12            node12.cimu.org node12 n12
> ... down to ...
> 192.168.0.21            node21.cimu.org node21 n21
> 
> Note, node10 and hpc are the same system (2 different NICs).


Don't confuse the machinefile with the NICs that OMPI will try to use.  The 
machinefile is only hosts on which OMPI will launch.  Specifically: the 
machinefile does not influence which NICs OMPI will use for MPI communications.

> hpc:kargl[252] /usr/local/openmpi-1.4.4/bin/mpif90 -o z -g -O ring_f90.f90 
> hpc:kargl[253] cat > mf1
> node10 slots=1
> node11 slots=1
> node12 slots=1
> hpc:kargl[254] /usr/local/openmpi-1.4.4/bin/mpiexec -machinefile mf1 ./z
> Process 0 sending           10  to            1  tag          201  (          
>  3  processes in ring)
> 
> in another xterm if I attach to the process on node10, I see
> with gdb.
> 
> (gdb) bt
> #0  0x00000003c10f9b9c in kevent () from /lib/libc.so.7
> #1  0x000000000052ca18 in kq_dispatch ()
> #2  0x000000000052ba93 in opal_event_base_loop ()
> #3  0x000000000052549b in opal_progress ()
> #4  0x000000000048fcfc in mca_pml_ob1_send ()
> #5  0x0000000000428873 in PMPI_Send ()
> #6  0x000000000041a890 in pmpi_send__ ()
> #7  0x000000000041a3f0 in ring () at ring_f90.f90:34
> #8  0x000000000041a640 in main (argc=<value optimized out>, 
>    argv=<value optimized out>) at ring_f90.f90:10
> #9  0x000000000041a1cc in _start ()
> (gdb) quit
> 
> Now, eliminating node10 from the machine file, I see:
> 
> hpc:kargl[255] cat > mf2
> node11 slots=1
> node12 slots=1
> node13 slots=1
> hpc:kargl[256] /usr/local/openmpi-1.4.4/bin/mpiexec -machinefile mf2 ./z
> Process 0 sending           10  to            1  tag          201  (          
>  3  processes in ring)
> Process 0 sent to            1
> Process 0 decremented value:           9
> Process 0 decremented value:           8
> Process 0 decremented value:           7
> Process 0 decremented value:           6
> Process 0 decremented value:           5
> Process 0 decremented value:           4
> Process 0 decremented value:           3
> Process 0 decremented value:           2
> Process 0 decremented value:           1
> Process 0 decremented value:           0
> Process            0  exiting
> Process            1  exiting
> Process            2  exiting
> 
> I also have a simple mpi test program netmpi.c from Argonne.
> It shows
> 
> hpc:kargl[263] /usr/local/openmpi-1.4.4/bin/mpicc -o z -g -O GetOpt.c netmpi.c
> hpc:kargl[264] cat mf_ompi_3 
> node11.cimu.org slots=1
> node16.cimu.org slots=1
> hpc:kargl[265] /usr/local/openmpi-1.4.4/bin/mpiexec -machinefile mf_ompi_3 ./z
> 1: node16.cimu.org
> 0: node11.cimu.org
> Latency: 0.000073617
> Sync Time: 0.000147234
> Now starting main loop
>  0:         0 bytes 16384 times -->    0.00 Mbps in 0.000073612 sec
>  1:         1 bytes 16384 times -->    0.10 Mbps in 0.000073612 sec
>  2:         2 bytes 3396 times -->    0.21 Mbps in 0.000073611 sec
>  3:         3 bytes 1698 times -->    0.31 Mbps in 0.000073609 sec
>  4:         5 bytes 2264 times -->    0.52 Mbps in 0.000073610 sec
>  5:         7 bytes 1358 times -->    0.73 Mbps in 0.000073608 sec
> 
> 
> hpc:kargl[268] cat mf_ompi_1
> node10.cimu.org slots=1
> node16.cimu.org slots=1
> hpc:kargl[267] /usr/local/openmpi-1.4.4/bin/mpiexec -machinefile mf_ompi_1 ./z
> 0: hpc.apl.washington.edu
> 1: node16.cimu.org

What function is netmpi.c using to get the hostname that is printed?  It might 
be using MPI_Get_processor_name() or gethostname() -- both of which may return 
whatever hostname(1) returns.  

Again -- this is not an indicator of which NIC Open MPI is using.

> (gdb) bt
> #0  0x00000003c0bedb9c in kevent () from /lib/libc.so.7
> #1  0x000000000052d648 in kq_dispatch ()
> #2  0x000000000052c6c3 in opal_event_base_loop ()
> #3  0x00000000005260cb in opal_progress ()
> #4  0x0000000000491d1c in mca_pml_ob1_send ()
> #5  0x000000000043c753 in PMPI_Send ()
> #6  0x000000000041a112 in Sync (p=0x7fffffffd4d0) at netmpi.c:573
> #7  0x000000000041a3cf in DetermineLatencyReps (p=0x3) at netmpi.c:593
> #8  0x000000000041a4fe in TestLatency (p=0x3) at netmpi.c:630
> #9  0x000000000041a958 in main (argc=1, argv=0x7fffffffd6a0) at netmpi.c:213
> (gdb) quit

The easiest way to fix this is likely to use the btl_tcp_if_include or 
btl_tcp_if_exclude MCA parameters -- i.e., tell OMPI exactly which interfaces 
to use:

    http://www.open-mpi.org/faq/?category=tcp#tcp-selection

Hypothetically, however, OMPI should be able to determine that 192.168.0.x is 
not reachable from the 10.x network (assuming your netmasks are set right), and 
automatically not use the 10.x network to reach any of the non-node10 machines. 
 It's curious that this is not happening; I wonder if this is some kind of 
quirk of OMPI's reachability algorithms 
(http://www.open-mpi.org/faq/?category=tcp#tcp-routability) on FreeBSD...?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] tcp communication problems with 1.4.3 and 1.4.4 rc2 on FreeBSD

Reply via email to