A few clarifying questions: What is your netmask on these hosts?
Where is the MPI_ALLREDUCE in your app -- right away, or somewhere deep within the application? Can you replicate this with a simple MPI application that essentially calls MPI_INIT, MPI_ALLREDUCE, and MPI_FINALIZE? Can you replicate this with a simple MPI app that does an MPI_SEND / MPI_RECV between two processes on the different subnets? Thanks. > -----Original Message----- > From: users-boun...@open-mpi.org > [mailto:users-boun...@open-mpi.org] On Behalf Of openmpi-user > Sent: Sunday, July 02, 2006 7:20 AM > To: us...@open-mpi.org > Subject: [OMPI users] OS X, OpenMPI 1.1: An error occurred in > MPI_Allreduce on communicator MPI_COMM_WORLD > > Hi All, > > when the nodes belong to different subnets the following > error messages > pop up: > [powerbook.2-net:20826] *** An error occurred in MPI_Allreduce > [powerbook.2-net:20826] *** on communicator MPI_COMM_WORLD > [powerbook.2-net:20826] *** MPI_ERR_INTERN: internal error > [powerbook.2-net:20826] *** MPI_ERRORS_ARE_FATAL (goodbye) > > Here hostfile sets up three nodes in two subnets (192.168.3.x and > 192.168.2.x with mask 255.255.255.0). The 192.168.3.x-nodes are > connected via Gigabit-Ethernet, the 192.168.2.x-nodes are > connected via > WLAN. > > Frank > > > This is the full output: > > [powerbook:/Network/CFD/MVH-1.0] motte% mpirun -d -np 7 --hostfile > ./hostfile /Network/CFD/MVH-1.0/vhone > [powerbook.2-net:20821] procdir: (null) > [powerbook.2-net:20821] jobdir: (null) > [powerbook.2-net:20821] unidir: > /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe > [powerbook.2-net:20821] top: openmpi-sessions-motte@powerbook.2-net_0 > [powerbook.2-net:20821] tmp: /tmp > [powerbook.2-net:20821] connect_uni: contact info read > [powerbook.2-net:20821] connect_uni: connection not allowed > [powerbook.2-net:20821] [0,0,0] setting up session dir with > [powerbook.2-net:20821] tmpdir /tmp > [powerbook.2-net:20821] universe default-universe-20821 > [powerbook.2-net:20821] user motte > [powerbook.2-net:20821] host powerbook.2-net > [powerbook.2-net:20821] jobid 0 > [powerbook.2-net:20821] procid 0 > [powerbook.2-net:20821] procdir: > /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe > -20821/0/0 > [powerbook.2-net:20821] jobdir: > /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe-20821/0 > [powerbook.2-net:20821] unidir: > /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe-20821 > [powerbook.2-net:20821] top: openmpi-sessions-motte@powerbook.2-net_0 > [powerbook.2-net:20821] tmp: /tmp > [powerbook.2-net:20821] [0,0,0] contact_file > /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe -20821/universe-setup.txt > [powerbook.2-net:20821] [0,0,0] wrote setup file > [powerbook.2-net:20821] pls:rsh: local csh: 1, local bash: 0 > [powerbook.2-net:20821] pls:rsh: assuming same remote shell > as local shell > [powerbook.2-net:20821] pls:rsh: remote csh: 1, remote bash: 0 > [powerbook.2-net:20821] pls:rsh: final template argv: > [powerbook.2-net:20821] pls:rsh: /usr/bin/ssh <template> orted > --debug --bootproxy 1 --name <template> --num_procs 4 --vpid_start 0 > --nodename <template> --universe > motte@powerbook.2-net:default-universe-20821 --nsreplica > "0.0.0;tcp://192.168.2.3:54609" --gprreplica > "0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0 > [powerbook.2-net:20821] pls:rsh: launching on node Powerbook.2-net > [powerbook.2-net:20821] pls:rsh: not oversubscribed -- setting > mpi_yield_when_idle to 0 > [powerbook.2-net:20821] pls:rsh: Powerbook.2-net is a LOCAL node > [powerbook.2-net:20821] pls:rsh: changing to directory /Users/motte > [powerbook.2-net:20821] pls:rsh: executing: orted --debug > --bootproxy 1 > --name 0.0.1 --num_procs 4 --vpid_start 0 --nodename Powerbook.2-net > --universe motte@powerbook.2-net:default-universe-20821 --nsreplica > "0.0.0;tcp://192.168.2.3:54609" --gprreplica > "0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0 > [powerbook.2-net:20822] [0,0,1] setting up session dir with > [powerbook.2-net:20822] universe default-universe-20821 > [powerbook.2-net:20822] user motte > [powerbook.2-net:20822] host Powerbook.2-net > [powerbook.2-net:20822] jobid 0 > [powerbook.2-net:20822] procid 1 > [powerbook.2-net:20822] procdir: > /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe > -20821/0/1 > [powerbook.2-net:20822] jobdir: > /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821/0 > [powerbook.2-net:20822] unidir: > /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821 > [powerbook.2-net:20822] top: openmpi-sessions-motte@Powerbook.2-net_0 > [powerbook.2-net:20822] tmp: /tmp > [powerbook.2-net:20821] pls:rsh: launching on node g4d003.3-net > [powerbook.2-net:20821] pls:rsh: not oversubscribed -- setting > mpi_yield_when_idle to 0 > [powerbook.2-net:20821] pls:rsh: g4d003.3-net is a REMOTE node > [powerbook.2-net:20821] pls:rsh: executing: /usr/bin/ssh g4d003.3-net > orted --debug --bootproxy 1 --name 0.0.2 --num_procs 4 --vpid_start 0 > --nodename g4d003.3-net --universe > motte@powerbook.2-net:default-universe-20821 --nsreplica > "0.0.0;tcp://192.168.2.3:54609" --gprreplica > "0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0 > [powerbook.2-net:20821] pls:rsh: launching on node G5Dual.3-net > [powerbook.2-net:20821] pls:rsh: not oversubscribed -- setting > mpi_yield_when_idle to 0 > [powerbook.2-net:20821] pls:rsh: G5Dual.3-net is a REMOTE node > [powerbook.2-net:20821] pls:rsh: executing: /usr/bin/ssh G5Dual.3-net > orted --debug --bootproxy 1 --name 0.0.3 --num_procs 4 --vpid_start 0 > --nodename G5Dual.3-net --universe > motte@powerbook.2-net:default-universe-20821 --nsreplica > "0.0.0;tcp://192.168.2.3:54609" --gprreplica > "0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0 > [g4d003.3-net:00396] [0,0,2] setting up session dir with > [g4d003.3-net:00396] universe default-universe-20821 > [g4d003.3-net:00396] user motte > [g4d003.3-net:00396] host g4d003.3-net > [g4d003.3-net:00396] jobid 0 > [g4d003.3-net:00396] procid 2 > [g4d003.3-net:00396] procdir: > /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821/0/2 > [g4d003.3-net:00396] jobdir: > /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821/0 > [g4d003.3-net:00396] unidir: > /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821 > [g4d003.3-net:00396] top: openmpi-sessions-motte@g4d003.3-net_0 > [g4d003.3-net:00396] tmp: /tmp > [g5dual.3-net:00938] [0,0,3] setting up session dir with > [g5dual.3-net:00938] universe default-universe-20821 > [g5dual.3-net:00938] user motte > [g5dual.3-net:00938] host G5Dual.3-net > [g5dual.3-net:00938] jobid 0 > [g5dual.3-net:00938] procid 3 > [g5dual.3-net:00938] procdir: > /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/0/3 > [g5dual.3-net:00938] jobdir: > /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/0 > [g5dual.3-net:00938] unidir: > /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821 > [g5dual.3-net:00938] top: openmpi-sessions-motte@G5Dual.3-net_0 > [g5dual.3-net:00938] tmp: /tmp > [powerbook.2-net:20826] [0,1,6] setting up session dir with > [powerbook.2-net:20826] universe default-universe-20821 > [powerbook.2-net:20826] user motte > [powerbook.2-net:20826] host Powerbook.2-net > [powerbook.2-net:20826] jobid 1 > [powerbook.2-net:20826] procid 6 > [powerbook.2-net:20826] procdir: > /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe > -20821/1/6 > [powerbook.2-net:20826] jobdir: > /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821/1 > [powerbook.2-net:20826] unidir: > /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821 > [powerbook.2-net:20826] top: openmpi-sessions-motte@Powerbook.2-net_0 > [powerbook.2-net:20826] tmp: /tmp > [g5dual.3-net:00940] [0,1,0] setting up session dir with > [g5dual.3-net:00940] universe default-universe-20821 > [g5dual.3-net:00940] user motte > [g5dual.3-net:00940] host G5Dual.3-net > [g5dual.3-net:00940] jobid 1 > [g5dual.3-net:00940] procid 0 > [g5dual.3-net:00940] procdir: > /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/1/0 > [g5dual.3-net:00940] jobdir: > /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/1 > [g5dual.3-net:00940] unidir: > /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821 > [g5dual.3-net:00940] top: openmpi-sessions-motte@G5Dual.3-net_0 > [g5dual.3-net:00940] tmp: /tmp > [g5dual.3-net:00946] [0,1,3] setting up session dir with > [g5dual.3-net:00946] universe default-universe-20821 > [g5dual.3-net:00946] user motte > [g5dual.3-net:00946] host G5Dual.3-net > [g5dual.3-net:00946] jobid 1 > [g5dual.3-net:00946] procid 3 > [g5dual.3-net:00946] procdir: > /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/1/3 > [g5dual.3-net:00946] jobdir: > /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/1 > [g5dual.3-net:00946] unidir: > /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821 > [g5dual.3-net:00946] top: openmpi-sessions-motte@G5Dual.3-net_0 > [g5dual.3-net:00946] tmp: /tmp > [g5dual.3-net:00942] [0,1,1] setting up session dir with > [g5dual.3-net:00942] universe default-universe-20821 > [g5dual.3-net:00942] user motte > [g5dual.3-net:00942] host G5Dual.3-net > [g5dual.3-net:00942] jobid 1 > [g5dual.3-net:00942] procid 1 > [g5dual.3-net:00942] procdir: > /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/1/1 > [g5dual.3-net:00942] jobdir: > /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/1 > [g5dual.3-net:00942] unidir: > /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821 > [g5dual.3-net:00942] top: openmpi-sessions-motte@G5Dual.3-net_0 > [g5dual.3-net:00942] tmp: /tmp > [g5dual.3-net:00944] [0,1,2] setting up session dir with > [g5dual.3-net:00944] universe default-universe-20821 > [g5dual.3-net:00944] user motte > [g5dual.3-net:00944] host G5Dual.3-net > [g5dual.3-net:00944] jobid 1 > [g5dual.3-net:00944] procid 2 > [g5dual.3-net:00944] procdir: > /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/1/2 > [g5dual.3-net:00944] jobdir: > /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/1 > [g5dual.3-net:00944] unidir: > /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821 > [g5dual.3-net:00944] top: openmpi-sessions-motte@G5Dual.3-net_0 > [g5dual.3-net:00944] tmp: /tmp > [g4d003.3-net:00398] [0,1,4] setting up session dir with > [g4d003.3-net:00398] universe default-universe-20821 > [g4d003.3-net:00398] user motte > [g4d003.3-net:00398] host g4d003.3-net > [g4d003.3-net:00398] jobid 1 > [g4d003.3-net:00398] procid 4 > [g4d003.3-net:00398] procdir: > /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821/1/4 > [g4d003.3-net:00398] jobdir: > /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821/1 > [g4d003.3-net:00398] unidir: > /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821 > [g4d003.3-net:00398] top: openmpi-sessions-motte@g4d003.3-net_0 > [g4d003.3-net:00398] tmp: /tmp > [g4d003.3-net:00400] [0,1,5] setting up session dir with > [g4d003.3-net:00400] universe default-universe-20821 > [g4d003.3-net:00400] user motte > [g4d003.3-net:00400] host g4d003.3-net > [g4d003.3-net:00400] jobid 1 > [g4d003.3-net:00400] procid 5 > [g4d003.3-net:00400] procdir: > /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821/1/5 > [g4d003.3-net:00400] jobdir: > /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821/1 > [g4d003.3-net:00400] unidir: > /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821 > [g4d003.3-net:00400] top: openmpi-sessions-motte@g4d003.3-net_0 > [g4d003.3-net:00400] tmp: /tmp > [powerbook.2-net:20821] spawn: in job_state_callback(jobid = > 1, state = 0x4) > [powerbook.2-net:20821] Info: Setting up debugger process table for > applications > MPIR_being_debugged = 0 > MPIR_debug_gate = 0 > MPIR_debug_state = 1 > MPIR_acquired_pre_main = 0 > MPIR_i_am_starter = 0 > MPIR_proctable_size = 7 > MPIR_proctable: > (i, host, exe, pid) = (0, G5Dual.3-net, > /Network/CFD/MVH-1.0/vhone, 940) > (i, host, exe, pid) = (1, G5Dual.3-net, > /Network/CFD/MVH-1.0/vhone, 942) > (i, host, exe, pid) = (2, G5Dual.3-net, > /Network/CFD/MVH-1.0/vhone, 944) > (i, host, exe, pid) = (3, G5Dual.3-net, > /Network/CFD/MVH-1.0/vhone, 946) > (i, host, exe, pid) = (4, g4d003.3-net, > /Network/CFD/MVH-1.0/vhone, 398) > (i, host, exe, pid) = (5, g4d003.3-net, > /Network/CFD/MVH-1.0/vhone, 400) > (i, host, exe, pid) = (6, Powerbook.2-net, > /Network/CFD/MVH-1.0/vhone, 20826) > [powerbook.2-net:20826] [0,1,6] ompi_mpi_init completed > [g5dual.3-net:00940] [0,1,0] ompi_mpi_init completed > [g5dual.3-net:00942] [0,1,1] ompi_mpi_init completed > [g5dual.3-net:00944] [0,1,2] ompi_mpi_init completed > [g5dual.3-net:00946] [0,1,3] ompi_mpi_init completed > [g4d003.3-net:00398] [0,1,4] ompi_mpi_init completed > [g4d003.3-net:00400] [0,1,5] ompi_mpi_init completed > [powerbook.2-net:20826] *** An error occurred in MPI_Allreduce > [powerbook.2-net:20826] *** on communicator MPI_COMM_WORLD > [powerbook.2-net:20826] *** MPI_ERR_INTERN: internal error > [powerbook.2-net:20826] *** MPI_ERRORS_ARE_FATAL (goodbye) > -------------------------------------------------------------- > ------------ > WARNING: A process refused to die! > > Host: powerbook.2-net > PID: 20826 > > This process may still be running and/or consuming resources. > -------------------------------------------------------------- > ------------ > -------------------------------------------------------------- > ------------ > WARNING: A process refused to die! > > Host: g4d003.3-net > PID: 398 > > This process may still be running and/or consuming resources. > -------------------------------------------------------------- > ------------ > -------------------------------------------------------------- > ------------ > (skipped) > > >