A few clarifying questions:

What is your netmask on these hosts?

Where is the MPI_ALLREDUCE in your app -- right away, or somewhere deep
within the application?  Can you replicate this with a simple MPI
application that essentially calls MPI_INIT, MPI_ALLREDUCE, and
MPI_FINALIZE?

Can you replicate this with a simple MPI app that does an MPI_SEND /
MPI_RECV between two processes on the different subnets?  

Thanks.

> -----Original Message-----
> From: users-boun...@open-mpi.org 
> [mailto:users-boun...@open-mpi.org] On Behalf Of openmpi-user
> Sent: Sunday, July 02, 2006 7:20 AM
> To: us...@open-mpi.org
> Subject: [OMPI users] OS X, OpenMPI 1.1: An error occurred in 
> MPI_Allreduce on communicator MPI_COMM_WORLD
> 
> Hi All,
> 
> when the nodes belong to different subnets the following 
> error messages 
> pop up:
> [powerbook.2-net:20826] *** An error occurred in MPI_Allreduce
> [powerbook.2-net:20826] *** on communicator MPI_COMM_WORLD
> [powerbook.2-net:20826] *** MPI_ERR_INTERN: internal error
> [powerbook.2-net:20826] *** MPI_ERRORS_ARE_FATAL (goodbye)
> 
> Here hostfile sets up three nodes in two subnets (192.168.3.x and 
> 192.168.2.x with mask 255.255.255.0). The 192.168.3.x-nodes are 
> connected via Gigabit-Ethernet, the 192.168.2.x-nodes are 
> connected via 
> WLAN.
> 
> Frank
> 
> 
> This is the full output:
> 
> [powerbook:/Network/CFD/MVH-1.0] motte% mpirun -d -np 7 --hostfile 
> ./hostfile /Network/CFD/MVH-1.0/vhone
> [powerbook.2-net:20821] procdir: (null)
> [powerbook.2-net:20821] jobdir: (null)
> [powerbook.2-net:20821] unidir: 
> /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe
> [powerbook.2-net:20821] top: openmpi-sessions-motte@powerbook.2-net_0
> [powerbook.2-net:20821] tmp: /tmp
> [powerbook.2-net:20821] connect_uni: contact info read
> [powerbook.2-net:20821] connect_uni: connection not allowed
> [powerbook.2-net:20821] [0,0,0] setting up session dir with
> [powerbook.2-net:20821]         tmpdir /tmp
> [powerbook.2-net:20821]         universe default-universe-20821
> [powerbook.2-net:20821]         user motte
> [powerbook.2-net:20821]         host powerbook.2-net
> [powerbook.2-net:20821]         jobid 0
> [powerbook.2-net:20821]         procid 0
> [powerbook.2-net:20821] procdir: 
> /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe
> -20821/0/0
> [powerbook.2-net:20821] jobdir: 
> /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe-20821/0
> [powerbook.2-net:20821] unidir: 
> /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe-20821
> [powerbook.2-net:20821] top: openmpi-sessions-motte@powerbook.2-net_0
> [powerbook.2-net:20821] tmp: /tmp
> [powerbook.2-net:20821] [0,0,0] contact_file 
> /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe
-20821/universe-setup.txt
> [powerbook.2-net:20821] [0,0,0] wrote setup file
> [powerbook.2-net:20821] pls:rsh: local csh: 1, local bash: 0
> [powerbook.2-net:20821] pls:rsh: assuming same remote shell 
> as local shell
> [powerbook.2-net:20821] pls:rsh: remote csh: 1, remote bash: 0
> [powerbook.2-net:20821] pls:rsh: final template argv:
> [powerbook.2-net:20821] pls:rsh:     /usr/bin/ssh <template> orted 
> --debug --bootproxy 1 --name <template> --num_procs 4 --vpid_start 0 
> --nodename <template> --universe 
> motte@powerbook.2-net:default-universe-20821 --nsreplica 
> "0.0.0;tcp://192.168.2.3:54609" --gprreplica 
> "0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0
> [powerbook.2-net:20821] pls:rsh: launching on node Powerbook.2-net
> [powerbook.2-net:20821] pls:rsh: not oversubscribed -- setting 
> mpi_yield_when_idle to 0
> [powerbook.2-net:20821] pls:rsh: Powerbook.2-net is a LOCAL node
> [powerbook.2-net:20821] pls:rsh: changing to directory /Users/motte
> [powerbook.2-net:20821] pls:rsh: executing: orted --debug 
> --bootproxy 1 
> --name 0.0.1 --num_procs 4 --vpid_start 0 --nodename Powerbook.2-net 
> --universe motte@powerbook.2-net:default-universe-20821 --nsreplica 
> "0.0.0;tcp://192.168.2.3:54609" --gprreplica 
> "0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0
> [powerbook.2-net:20822] [0,0,1] setting up session dir with
> [powerbook.2-net:20822]         universe default-universe-20821
> [powerbook.2-net:20822]         user motte
> [powerbook.2-net:20822]         host Powerbook.2-net
> [powerbook.2-net:20822]         jobid 0
> [powerbook.2-net:20822]         procid 1
> [powerbook.2-net:20822] procdir: 
> /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe
> -20821/0/1
> [powerbook.2-net:20822] jobdir: 
> /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821/0
> [powerbook.2-net:20822] unidir: 
> /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821
> [powerbook.2-net:20822] top: openmpi-sessions-motte@Powerbook.2-net_0
> [powerbook.2-net:20822] tmp: /tmp
> [powerbook.2-net:20821] pls:rsh: launching on node g4d003.3-net
> [powerbook.2-net:20821] pls:rsh: not oversubscribed -- setting 
> mpi_yield_when_idle to 0
> [powerbook.2-net:20821] pls:rsh: g4d003.3-net is a REMOTE node
> [powerbook.2-net:20821] pls:rsh: executing: /usr/bin/ssh g4d003.3-net 
> orted --debug --bootproxy 1 --name 0.0.2 --num_procs 4 --vpid_start 0 
> --nodename g4d003.3-net --universe 
> motte@powerbook.2-net:default-universe-20821 --nsreplica 
> "0.0.0;tcp://192.168.2.3:54609" --gprreplica 
> "0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0
> [powerbook.2-net:20821] pls:rsh: launching on node G5Dual.3-net
> [powerbook.2-net:20821] pls:rsh: not oversubscribed -- setting 
> mpi_yield_when_idle to 0
> [powerbook.2-net:20821] pls:rsh: G5Dual.3-net is a REMOTE node
> [powerbook.2-net:20821] pls:rsh: executing: /usr/bin/ssh G5Dual.3-net 
> orted --debug --bootproxy 1 --name 0.0.3 --num_procs 4 --vpid_start 0 
> --nodename G5Dual.3-net --universe 
> motte@powerbook.2-net:default-universe-20821 --nsreplica 
> "0.0.0;tcp://192.168.2.3:54609" --gprreplica 
> "0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0
> [g4d003.3-net:00396] [0,0,2] setting up session dir with
> [g4d003.3-net:00396]    universe default-universe-20821
> [g4d003.3-net:00396]    user motte
> [g4d003.3-net:00396]    host g4d003.3-net
> [g4d003.3-net:00396]    jobid 0
> [g4d003.3-net:00396]    procid 2
> [g4d003.3-net:00396] procdir: 
> /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821/0/2
> [g4d003.3-net:00396] jobdir: 
> /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821/0
> [g4d003.3-net:00396] unidir: 
> /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821
> [g4d003.3-net:00396] top: openmpi-sessions-motte@g4d003.3-net_0
> [g4d003.3-net:00396] tmp: /tmp
> [g5dual.3-net:00938] [0,0,3] setting up session dir with
> [g5dual.3-net:00938]    universe default-universe-20821
> [g5dual.3-net:00938]    user motte
> [g5dual.3-net:00938]    host G5Dual.3-net
> [g5dual.3-net:00938]    jobid 0
> [g5dual.3-net:00938]    procid 3
> [g5dual.3-net:00938] procdir: 
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/0/3
> [g5dual.3-net:00938] jobdir: 
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/0
> [g5dual.3-net:00938] unidir: 
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821
> [g5dual.3-net:00938] top: openmpi-sessions-motte@G5Dual.3-net_0
> [g5dual.3-net:00938] tmp: /tmp
> [powerbook.2-net:20826] [0,1,6] setting up session dir with
> [powerbook.2-net:20826]         universe default-universe-20821
> [powerbook.2-net:20826]         user motte
> [powerbook.2-net:20826]         host Powerbook.2-net
> [powerbook.2-net:20826]         jobid 1
> [powerbook.2-net:20826]         procid 6
> [powerbook.2-net:20826] procdir: 
> /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe
> -20821/1/6
> [powerbook.2-net:20826] jobdir: 
> /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821/1
> [powerbook.2-net:20826] unidir: 
> /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821
> [powerbook.2-net:20826] top: openmpi-sessions-motte@Powerbook.2-net_0
> [powerbook.2-net:20826] tmp: /tmp
> [g5dual.3-net:00940] [0,1,0] setting up session dir with
> [g5dual.3-net:00940]    universe default-universe-20821
> [g5dual.3-net:00940]    user motte
> [g5dual.3-net:00940]    host G5Dual.3-net
> [g5dual.3-net:00940]    jobid 1
> [g5dual.3-net:00940]    procid 0
> [g5dual.3-net:00940] procdir: 
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/1/0
> [g5dual.3-net:00940] jobdir: 
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/1
> [g5dual.3-net:00940] unidir: 
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821
> [g5dual.3-net:00940] top: openmpi-sessions-motte@G5Dual.3-net_0
> [g5dual.3-net:00940] tmp: /tmp
> [g5dual.3-net:00946] [0,1,3] setting up session dir with
> [g5dual.3-net:00946]    universe default-universe-20821
> [g5dual.3-net:00946]    user motte
> [g5dual.3-net:00946]    host G5Dual.3-net
> [g5dual.3-net:00946]    jobid 1
> [g5dual.3-net:00946]    procid 3
> [g5dual.3-net:00946] procdir: 
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/1/3
> [g5dual.3-net:00946] jobdir: 
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/1
> [g5dual.3-net:00946] unidir: 
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821
> [g5dual.3-net:00946] top: openmpi-sessions-motte@G5Dual.3-net_0
> [g5dual.3-net:00946] tmp: /tmp
> [g5dual.3-net:00942] [0,1,1] setting up session dir with
> [g5dual.3-net:00942]    universe default-universe-20821
> [g5dual.3-net:00942]    user motte
> [g5dual.3-net:00942]    host G5Dual.3-net
> [g5dual.3-net:00942]    jobid 1
> [g5dual.3-net:00942]    procid 1
> [g5dual.3-net:00942] procdir: 
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/1/1
> [g5dual.3-net:00942] jobdir: 
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/1
> [g5dual.3-net:00942] unidir: 
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821
> [g5dual.3-net:00942] top: openmpi-sessions-motte@G5Dual.3-net_0
> [g5dual.3-net:00942] tmp: /tmp
> [g5dual.3-net:00944] [0,1,2] setting up session dir with
> [g5dual.3-net:00944]    universe default-universe-20821
> [g5dual.3-net:00944]    user motte
> [g5dual.3-net:00944]    host G5Dual.3-net
> [g5dual.3-net:00944]    jobid 1
> [g5dual.3-net:00944]    procid 2
> [g5dual.3-net:00944] procdir: 
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/1/2
> [g5dual.3-net:00944] jobdir: 
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821/1
> [g5dual.3-net:00944] unidir: 
> /tmp/openmpi-sessions-motte@G5Dual.3-net_0/default-universe-20821
> [g5dual.3-net:00944] top: openmpi-sessions-motte@G5Dual.3-net_0
> [g5dual.3-net:00944] tmp: /tmp
> [g4d003.3-net:00398] [0,1,4] setting up session dir with
> [g4d003.3-net:00398]    universe default-universe-20821
> [g4d003.3-net:00398]    user motte
> [g4d003.3-net:00398]    host g4d003.3-net
> [g4d003.3-net:00398]    jobid 1
> [g4d003.3-net:00398]    procid 4
> [g4d003.3-net:00398] procdir: 
> /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821/1/4
> [g4d003.3-net:00398] jobdir: 
> /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821/1
> [g4d003.3-net:00398] unidir: 
> /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821
> [g4d003.3-net:00398] top: openmpi-sessions-motte@g4d003.3-net_0
> [g4d003.3-net:00398] tmp: /tmp
> [g4d003.3-net:00400] [0,1,5] setting up session dir with
> [g4d003.3-net:00400]    universe default-universe-20821
> [g4d003.3-net:00400]    user motte
> [g4d003.3-net:00400]    host g4d003.3-net
> [g4d003.3-net:00400]    jobid 1
> [g4d003.3-net:00400]    procid 5
> [g4d003.3-net:00400] procdir: 
> /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821/1/5
> [g4d003.3-net:00400] jobdir: 
> /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821/1
> [g4d003.3-net:00400] unidir: 
> /tmp/openmpi-sessions-motte@g4d003.3-net_0/default-universe-20821
> [g4d003.3-net:00400] top: openmpi-sessions-motte@g4d003.3-net_0
> [g4d003.3-net:00400] tmp: /tmp
> [powerbook.2-net:20821] spawn: in job_state_callback(jobid = 
> 1, state = 0x4)
> [powerbook.2-net:20821] Info: Setting up debugger process table for 
> applications
>   MPIR_being_debugged = 0
>   MPIR_debug_gate = 0
>   MPIR_debug_state = 1
>   MPIR_acquired_pre_main = 0
>   MPIR_i_am_starter = 0
>   MPIR_proctable_size = 7
>   MPIR_proctable:
>     (i, host, exe, pid) = (0, G5Dual.3-net, 
> /Network/CFD/MVH-1.0/vhone, 940)
>     (i, host, exe, pid) = (1, G5Dual.3-net, 
> /Network/CFD/MVH-1.0/vhone, 942)
>     (i, host, exe, pid) = (2, G5Dual.3-net, 
> /Network/CFD/MVH-1.0/vhone, 944)
>     (i, host, exe, pid) = (3, G5Dual.3-net, 
> /Network/CFD/MVH-1.0/vhone, 946)
>     (i, host, exe, pid) = (4, g4d003.3-net, 
> /Network/CFD/MVH-1.0/vhone, 398)
>     (i, host, exe, pid) = (5, g4d003.3-net, 
> /Network/CFD/MVH-1.0/vhone, 400)
>     (i, host, exe, pid) = (6, Powerbook.2-net, 
> /Network/CFD/MVH-1.0/vhone, 20826)
> [powerbook.2-net:20826] [0,1,6] ompi_mpi_init completed
> [g5dual.3-net:00940] [0,1,0] ompi_mpi_init completed
> [g5dual.3-net:00942] [0,1,1] ompi_mpi_init completed
> [g5dual.3-net:00944] [0,1,2] ompi_mpi_init completed
> [g5dual.3-net:00946] [0,1,3] ompi_mpi_init completed
> [g4d003.3-net:00398] [0,1,4] ompi_mpi_init completed
> [g4d003.3-net:00400] [0,1,5] ompi_mpi_init completed
> [powerbook.2-net:20826] *** An error occurred in MPI_Allreduce
> [powerbook.2-net:20826] *** on communicator MPI_COMM_WORLD
> [powerbook.2-net:20826] *** MPI_ERR_INTERN: internal error
> [powerbook.2-net:20826] *** MPI_ERRORS_ARE_FATAL (goodbye)
> --------------------------------------------------------------
> ------------
> WARNING: A process refused to die!
> 
> Host: powerbook.2-net
> PID:  20826
> 
> This process may still be running and/or consuming resources.
> --------------------------------------------------------------
> ------------
> --------------------------------------------------------------
> ------------
> WARNING: A process refused to die!
> 
> Host: g4d003.3-net
> PID:  398
> 
> This process may still be running and/or consuming resources.
> --------------------------------------------------------------
> ------------
> --------------------------------------------------------------
> ------------
> (skipped)
> 
> 
> 

Reply via email to