[OMPI users] OS X, OpenMPI 1.1: An error occurred in MPI_Allreduce on communicator MPI_COMM_WORLD
Hi All, when the nodes belong to different subnets the following error messages pop up: [powerbook.2-net:20826] *** An error occurred in MPI_Allreduce [powerbook.2-net:20826] *** on communicator MPI_COMM_WORLD [powerbook.2-net:20826] *** MPI_ERR_INTERN: internal error [powerbook.2-net:20826] *** MPI_ERRORS_ARE_FATAL (goodbye) Here hostfile sets up three nodes in two subnets (192.168.3.x and 192.168.2.x with mask 255.255.255.0). The 192.168.3.x-nodes are connected via Gigabit-Ethernet, the 192.168.2.x-nodes are connected via WLAN. Frank This is the full output: [powerbook:/Network/CFD/MVH-1.0] motte% mpirun -d -np 7 --hostfile ./hostfile /Network/CFD/MVH-1.0/vhone [powerbook.2-net:20821] procdir: (null) [powerbook.2-net:20821] jobdir: (null) [powerbook.2-net:20821] unidir: /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe [powerbook.2-net:20821] top: openmpi-sessions-motte@powerbook.2-net_0 [powerbook.2-net:20821] tmp: /tmp [powerbook.2-net:20821] connect_uni: contact info read [powerbook.2-net:20821] connect_uni: connection not allowed [powerbook.2-net:20821] [0,0,0] setting up session dir with [powerbook.2-net:20821] tmpdir /tmp [powerbook.2-net:20821] universe default-universe-20821 [powerbook.2-net:20821] user motte [powerbook.2-net:20821] host powerbook.2-net [powerbook.2-net:20821] jobid 0 [powerbook.2-net:20821] procid 0 [powerbook.2-net:20821] procdir: /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe-20821/0/0 [powerbook.2-net:20821] jobdir: /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe-20821/0 [powerbook.2-net:20821] unidir: /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe-20821 [powerbook.2-net:20821] top: openmpi-sessions-motte@powerbook.2-net_0 [powerbook.2-net:20821] tmp: /tmp [powerbook.2-net:20821] [0,0,0] contact_file /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe-20821/universe-setup.txt [powerbook.2-net:20821] [0,0,0] wrote setup file [powerbook.2-net:20821] pls:rsh: local csh: 1, local bash: 0 [powerbook.2-net:20821] pls:rsh: assuming same remote shell as local shell [powerbook.2-net:20821] pls:rsh: remote csh: 1, remote bash: 0 [powerbook.2-net:20821] pls:rsh: final template argv: [powerbook.2-net:20821] pls:rsh: /usr/bin/ssh orted --debug --bootproxy 1 --name --num_procs 4 --vpid_start 0 --nodename --universe motte@powerbook.2-net:default-universe-20821 --nsreplica "0.0.0;tcp://192.168.2.3:54609" --gprreplica "0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0 [powerbook.2-net:20821] pls:rsh: launching on node Powerbook.2-net [powerbook.2-net:20821] pls:rsh: not oversubscribed -- setting mpi_yield_when_idle to 0 [powerbook.2-net:20821] pls:rsh: Powerbook.2-net is a LOCAL node [powerbook.2-net:20821] pls:rsh: changing to directory /Users/motte [powerbook.2-net:20821] pls:rsh: executing: orted --debug --bootproxy 1 --name 0.0.1 --num_procs 4 --vpid_start 0 --nodename Powerbook.2-net --universe motte@powerbook.2-net:default-universe-20821 --nsreplica "0.0.0;tcp://192.168.2.3:54609" --gprreplica "0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0 [powerbook.2-net:20822] [0,0,1] setting up session dir with [powerbook.2-net:20822] universe default-universe-20821 [powerbook.2-net:20822] user motte [powerbook.2-net:20822] host Powerbook.2-net [powerbook.2-net:20822] jobid 0 [powerbook.2-net:20822] procid 1 [powerbook.2-net:20822] procdir: /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821/0/1 [powerbook.2-net:20822] jobdir: /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821/0 [powerbook.2-net:20822] unidir: /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821 [powerbook.2-net:20822] top: openmpi-sessions-motte@Powerbook.2-net_0 [powerbook.2-net:20822] tmp: /tmp [powerbook.2-net:20821] pls:rsh: launching on node g4d003.3-net [powerbook.2-net:20821] pls:rsh: not oversubscribed -- setting mpi_yield_when_idle to 0 [powerbook.2-net:20821] pls:rsh: g4d003.3-net is a REMOTE node [powerbook.2-net:20821] pls:rsh: executing: /usr/bin/ssh g4d003.3-net orted --debug --bootproxy 1 --name 0.0.2 --num_procs 4 --vpid_start 0 --nodename g4d003.3-net --universe motte@powerbook.2-net:default-universe-20821 --nsreplica "0.0.0;tcp://192.168.2.3:54609" --gprreplica "0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0 [powerbook.2-net:20821] pls:rsh: launching on node G5Dual.3-net [powerbook.2-net:20821] pls:rsh: not oversubscribed -- setting mpi_yield_when_idle to 0 [powerbook.2-net:20821] pls:rsh: G5Dual.3-net is a REMOTE node [powerbook.2-net:20821] pls:rsh: executing: /usr/bin/ssh G5Dual.3-net orted --debug --bootproxy 1 --name 0.0.3 --num_procs 4 --vpid_start 0 --nodename G5Dual.3-net --universe motte@powerbook.2-net:default-universe-20821 --nsreplica "0.0.0;tcp://192.168.2.3:54609" --gprreplica "0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0
Re: [OMPI users] OS X, OpenMPI 1.1: An error occurred in MPI_Allreduce on communicator MPI_COMM_WORLD
A few clarifying questions: What is your netmask on these hosts? Where is the MPI_ALLREDUCE in your app -- right away, or somewhere deep within the application? Can you replicate this with a simple MPI application that essentially calls MPI_INIT, MPI_ALLREDUCE, and MPI_FINALIZE? Can you replicate this with a simple MPI app that does an MPI_SEND / MPI_RECV between two processes on the different subnets? Thanks. > -Original Message- > From: users-boun...@open-mpi.org > [mailto:users-boun...@open-mpi.org] On Behalf Of openmpi-user > Sent: Sunday, July 02, 2006 7:20 AM > To: us...@open-mpi.org > Subject: [OMPI users] OS X, OpenMPI 1.1: An error occurred in > MPI_Allreduce on communicator MPI_COMM_WORLD > > Hi All, > > when the nodes belong to different subnets the following > error messages > pop up: > [powerbook.2-net:20826] *** An error occurred in MPI_Allreduce > [powerbook.2-net:20826] *** on communicator MPI_COMM_WORLD > [powerbook.2-net:20826] *** MPI_ERR_INTERN: internal error > [powerbook.2-net:20826] *** MPI_ERRORS_ARE_FATAL (goodbye) > > Here hostfile sets up three nodes in two subnets (192.168.3.x and > 192.168.2.x with mask 255.255.255.0). The 192.168.3.x-nodes are > connected via Gigabit-Ethernet, the 192.168.2.x-nodes are > connected via > WLAN. > > Frank > > > This is the full output: > > [powerbook:/Network/CFD/MVH-1.0] motte% mpirun -d -np 7 --hostfile > ./hostfile /Network/CFD/MVH-1.0/vhone > [powerbook.2-net:20821] procdir: (null) > [powerbook.2-net:20821] jobdir: (null) > [powerbook.2-net:20821] unidir: > /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe > [powerbook.2-net:20821] top: openmpi-sessions-motte@powerbook.2-net_0 > [powerbook.2-net:20821] tmp: /tmp > [powerbook.2-net:20821] connect_uni: contact info read > [powerbook.2-net:20821] connect_uni: connection not allowed > [powerbook.2-net:20821] [0,0,0] setting up session dir with > [powerbook.2-net:20821] tmpdir /tmp > [powerbook.2-net:20821] universe default-universe-20821 > [powerbook.2-net:20821] user motte > [powerbook.2-net:20821] host powerbook.2-net > [powerbook.2-net:20821] jobid 0 > [powerbook.2-net:20821] procid 0 > [powerbook.2-net:20821] procdir: > /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe > -20821/0/0 > [powerbook.2-net:20821] jobdir: > /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe-20821/0 > [powerbook.2-net:20821] unidir: > /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe-20821 > [powerbook.2-net:20821] top: openmpi-sessions-motte@powerbook.2-net_0 > [powerbook.2-net:20821] tmp: /tmp > [powerbook.2-net:20821] [0,0,0] contact_file > /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe -20821/universe-setup.txt > [powerbook.2-net:20821] [0,0,0] wrote setup file > [powerbook.2-net:20821] pls:rsh: local csh: 1, local bash: 0 > [powerbook.2-net:20821] pls:rsh: assuming same remote shell > as local shell > [powerbook.2-net:20821] pls:rsh: remote csh: 1, remote bash: 0 > [powerbook.2-net:20821] pls:rsh: final template argv: > [powerbook.2-net:20821] pls:rsh: /usr/bin/ssh orted > --debug --bootproxy 1 --name --num_procs 4 --vpid_start 0 > --nodename --universe > motte@powerbook.2-net:default-universe-20821 --nsreplica > "0.0.0;tcp://192.168.2.3:54609" --gprreplica > "0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0 > [powerbook.2-net:20821] pls:rsh: launching on node Powerbook.2-net > [powerbook.2-net:20821] pls:rsh: not oversubscribed -- setting > mpi_yield_when_idle to 0 > [powerbook.2-net:20821] pls:rsh: Powerbook.2-net is a LOCAL node > [powerbook.2-net:20821] pls:rsh: changing to directory /Users/motte > [powerbook.2-net:20821] pls:rsh: executing: orted --debug > --bootproxy 1 > --name 0.0.1 --num_procs 4 --vpid_start 0 --nodename Powerbook.2-net > --universe motte@powerbook.2-net:default-universe-20821 --nsreplica > "0.0.0;tcp://192.168.2.3:54609" --gprreplica > "0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0 > [powerbook.2-net:20822] [0,0,1] setting up session dir with > [powerbook.2-net:20822] universe default-universe-20821 > [powerbook.2-net:20822] user motte > [powerbook.2-net:20822] host Powerbook.2-net > [powerbook.2-net:20822] jobid 0 > [powerbook.2-net:20822] procid 1 > [powerbook.2-net:20822] procdir: > /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe > -20821/0/1 > [powerbook.2-net:20822] jobdir: > /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821/0 > [powerbook.2-net:20822] unidir: > /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821 > [powerbook.2-net:20822] top: openmpi-sessions-motte@Powerbook.2-net_0 > [powerbook.2-net:20822] tmp: /tmp > [powerbook.2-net:20821] pls:rsh: launching on node g4d003.3-net > [powerbook.2-net:20821] pls:rsh: not oversubscribed -- setting > mpi_yield_when_idle to 0 > [powerbook.2-net:20821] pls:rsh: g4d003.