[OMPI users] OS X, OpenMPI 1.1: An error occurred in MPI_Allreduce on communicator MPI_COMM_WORLD

2006-07-02 Thread openmpi-user

Hi All,

when the nodes belong to different subnets the following error messages 
pop up:

[powerbook.2-net:20826] *** An error occurred in MPI_Allreduce
[powerbook.2-net:20826] *** on communicator MPI_COMM_WORLD
[powerbook.2-net:20826] *** MPI_ERR_INTERN: internal error
[powerbook.2-net:20826] *** MPI_ERRORS_ARE_FATAL (goodbye)

Here hostfile sets up three nodes in two subnets (192.168.3.x and 
192.168.2.x with mask 255.255.255.0). The 192.168.3.x-nodes are 
connected via Gigabit-Ethernet, the 192.168.2.x-nodes are connected via 
WLAN.


Frank


This is the full output:

[powerbook:/Network/CFD/MVH-1.0] motte% mpirun -d -np 7 --hostfile 
./hostfile /Network/CFD/MVH-1.0/vhone

[powerbook.2-net:20821] procdir: (null)
[powerbook.2-net:20821] jobdir: (null)
[powerbook.2-net:20821] unidir: 
/tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe

[powerbook.2-net:20821] top: openmpi-sessions-motte@powerbook.2-net_0
[powerbook.2-net:20821] tmp: /tmp
[powerbook.2-net:20821] connect_uni: contact info read
[powerbook.2-net:20821] connect_uni: connection not allowed
[powerbook.2-net:20821] [0,0,0] setting up session dir with
[powerbook.2-net:20821] tmpdir /tmp
[powerbook.2-net:20821] universe default-universe-20821
[powerbook.2-net:20821] user motte
[powerbook.2-net:20821] host powerbook.2-net
[powerbook.2-net:20821] jobid 0
[powerbook.2-net:20821] procid 0
[powerbook.2-net:20821] procdir: 
/tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe-20821/0/0
[powerbook.2-net:20821] jobdir: 
/tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe-20821/0
[powerbook.2-net:20821] unidir: 
/tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe-20821

[powerbook.2-net:20821] top: openmpi-sessions-motte@powerbook.2-net_0
[powerbook.2-net:20821] tmp: /tmp
[powerbook.2-net:20821] [0,0,0] contact_file 
/tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe-20821/universe-setup.txt

[powerbook.2-net:20821] [0,0,0] wrote setup file
[powerbook.2-net:20821] pls:rsh: local csh: 1, local bash: 0
[powerbook.2-net:20821] pls:rsh: assuming same remote shell as local shell
[powerbook.2-net:20821] pls:rsh: remote csh: 1, remote bash: 0
[powerbook.2-net:20821] pls:rsh: final template argv:
[powerbook.2-net:20821] pls:rsh: /usr/bin/ssh  orted 
--debug --bootproxy 1 --name  --num_procs 4 --vpid_start 0 
--nodename  --universe 
motte@powerbook.2-net:default-universe-20821 --nsreplica 
"0.0.0;tcp://192.168.2.3:54609" --gprreplica 
"0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0

[powerbook.2-net:20821] pls:rsh: launching on node Powerbook.2-net
[powerbook.2-net:20821] pls:rsh: not oversubscribed -- setting 
mpi_yield_when_idle to 0

[powerbook.2-net:20821] pls:rsh: Powerbook.2-net is a LOCAL node
[powerbook.2-net:20821] pls:rsh: changing to directory /Users/motte
[powerbook.2-net:20821] pls:rsh: executing: orted --debug --bootproxy 1 
--name 0.0.1 --num_procs 4 --vpid_start 0 --nodename Powerbook.2-net 
--universe motte@powerbook.2-net:default-universe-20821 --nsreplica 
"0.0.0;tcp://192.168.2.3:54609" --gprreplica 
"0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0

[powerbook.2-net:20822] [0,0,1] setting up session dir with
[powerbook.2-net:20822] universe default-universe-20821
[powerbook.2-net:20822] user motte
[powerbook.2-net:20822] host Powerbook.2-net
[powerbook.2-net:20822] jobid 0
[powerbook.2-net:20822] procid 1
[powerbook.2-net:20822] procdir: 
/tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821/0/1
[powerbook.2-net:20822] jobdir: 
/tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821/0
[powerbook.2-net:20822] unidir: 
/tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821

[powerbook.2-net:20822] top: openmpi-sessions-motte@Powerbook.2-net_0
[powerbook.2-net:20822] tmp: /tmp
[powerbook.2-net:20821] pls:rsh: launching on node g4d003.3-net
[powerbook.2-net:20821] pls:rsh: not oversubscribed -- setting 
mpi_yield_when_idle to 0

[powerbook.2-net:20821] pls:rsh: g4d003.3-net is a REMOTE node
[powerbook.2-net:20821] pls:rsh: executing: /usr/bin/ssh g4d003.3-net 
orted --debug --bootproxy 1 --name 0.0.2 --num_procs 4 --vpid_start 0 
--nodename g4d003.3-net --universe 
motte@powerbook.2-net:default-universe-20821 --nsreplica 
"0.0.0;tcp://192.168.2.3:54609" --gprreplica 
"0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0

[powerbook.2-net:20821] pls:rsh: launching on node G5Dual.3-net
[powerbook.2-net:20821] pls:rsh: not oversubscribed -- setting 
mpi_yield_when_idle to 0

[powerbook.2-net:20821] pls:rsh: G5Dual.3-net is a REMOTE node
[powerbook.2-net:20821] pls:rsh: executing: /usr/bin/ssh G5Dual.3-net 
orted --debug --bootproxy 1 --name 0.0.3 --num_procs 4 --vpid_start 0 
--nodename G5Dual.3-net --universe 
motte@powerbook.2-net:default-universe-20821 --nsreplica 
"0.0.0;tcp://192.168.2.3:54609" --gprreplica 
"0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0

Re: [OMPI users] OS X, OpenMPI 1.1: An error occurred in MPI_Allreduce on communicator MPI_COMM_WORLD

2006-07-02 Thread Jeff Squyres (jsquyres)
A few clarifying questions:

What is your netmask on these hosts?

Where is the MPI_ALLREDUCE in your app -- right away, or somewhere deep
within the application?  Can you replicate this with a simple MPI
application that essentially calls MPI_INIT, MPI_ALLREDUCE, and
MPI_FINALIZE?

Can you replicate this with a simple MPI app that does an MPI_SEND /
MPI_RECV between two processes on the different subnets?  

Thanks.

> -Original Message-
> From: users-boun...@open-mpi.org 
> [mailto:users-boun...@open-mpi.org] On Behalf Of openmpi-user
> Sent: Sunday, July 02, 2006 7:20 AM
> To: us...@open-mpi.org
> Subject: [OMPI users] OS X, OpenMPI 1.1: An error occurred in 
> MPI_Allreduce on communicator MPI_COMM_WORLD
> 
> Hi All,
> 
> when the nodes belong to different subnets the following 
> error messages 
> pop up:
> [powerbook.2-net:20826] *** An error occurred in MPI_Allreduce
> [powerbook.2-net:20826] *** on communicator MPI_COMM_WORLD
> [powerbook.2-net:20826] *** MPI_ERR_INTERN: internal error
> [powerbook.2-net:20826] *** MPI_ERRORS_ARE_FATAL (goodbye)
> 
> Here hostfile sets up three nodes in two subnets (192.168.3.x and 
> 192.168.2.x with mask 255.255.255.0). The 192.168.3.x-nodes are 
> connected via Gigabit-Ethernet, the 192.168.2.x-nodes are 
> connected via 
> WLAN.
> 
> Frank
> 
> 
> This is the full output:
> 
> [powerbook:/Network/CFD/MVH-1.0] motte% mpirun -d -np 7 --hostfile 
> ./hostfile /Network/CFD/MVH-1.0/vhone
> [powerbook.2-net:20821] procdir: (null)
> [powerbook.2-net:20821] jobdir: (null)
> [powerbook.2-net:20821] unidir: 
> /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe
> [powerbook.2-net:20821] top: openmpi-sessions-motte@powerbook.2-net_0
> [powerbook.2-net:20821] tmp: /tmp
> [powerbook.2-net:20821] connect_uni: contact info read
> [powerbook.2-net:20821] connect_uni: connection not allowed
> [powerbook.2-net:20821] [0,0,0] setting up session dir with
> [powerbook.2-net:20821] tmpdir /tmp
> [powerbook.2-net:20821] universe default-universe-20821
> [powerbook.2-net:20821] user motte
> [powerbook.2-net:20821] host powerbook.2-net
> [powerbook.2-net:20821] jobid 0
> [powerbook.2-net:20821] procid 0
> [powerbook.2-net:20821] procdir: 
> /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe
> -20821/0/0
> [powerbook.2-net:20821] jobdir: 
> /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe-20821/0
> [powerbook.2-net:20821] unidir: 
> /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe-20821
> [powerbook.2-net:20821] top: openmpi-sessions-motte@powerbook.2-net_0
> [powerbook.2-net:20821] tmp: /tmp
> [powerbook.2-net:20821] [0,0,0] contact_file 
> /tmp/openmpi-sessions-motte@powerbook.2-net_0/default-universe
-20821/universe-setup.txt
> [powerbook.2-net:20821] [0,0,0] wrote setup file
> [powerbook.2-net:20821] pls:rsh: local csh: 1, local bash: 0
> [powerbook.2-net:20821] pls:rsh: assuming same remote shell 
> as local shell
> [powerbook.2-net:20821] pls:rsh: remote csh: 1, remote bash: 0
> [powerbook.2-net:20821] pls:rsh: final template argv:
> [powerbook.2-net:20821] pls:rsh: /usr/bin/ssh  orted 
> --debug --bootproxy 1 --name  --num_procs 4 --vpid_start 0 
> --nodename  --universe 
> motte@powerbook.2-net:default-universe-20821 --nsreplica 
> "0.0.0;tcp://192.168.2.3:54609" --gprreplica 
> "0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0
> [powerbook.2-net:20821] pls:rsh: launching on node Powerbook.2-net
> [powerbook.2-net:20821] pls:rsh: not oversubscribed -- setting 
> mpi_yield_when_idle to 0
> [powerbook.2-net:20821] pls:rsh: Powerbook.2-net is a LOCAL node
> [powerbook.2-net:20821] pls:rsh: changing to directory /Users/motte
> [powerbook.2-net:20821] pls:rsh: executing: orted --debug 
> --bootproxy 1 
> --name 0.0.1 --num_procs 4 --vpid_start 0 --nodename Powerbook.2-net 
> --universe motte@powerbook.2-net:default-universe-20821 --nsreplica 
> "0.0.0;tcp://192.168.2.3:54609" --gprreplica 
> "0.0.0;tcp://192.168.2.3:54609" --mpi-call-yield 0
> [powerbook.2-net:20822] [0,0,1] setting up session dir with
> [powerbook.2-net:20822] universe default-universe-20821
> [powerbook.2-net:20822] user motte
> [powerbook.2-net:20822] host Powerbook.2-net
> [powerbook.2-net:20822] jobid 0
> [powerbook.2-net:20822] procid 1
> [powerbook.2-net:20822] procdir: 
> /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe
> -20821/0/1
> [powerbook.2-net:20822] jobdir: 
> /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821/0
> [powerbook.2-net:20822] unidir: 
> /tmp/openmpi-sessions-motte@Powerbook.2-net_0/default-universe-20821
> [powerbook.2-net:20822] top: openmpi-sessions-motte@Powerbook.2-net_0
> [powerbook.2-net:20822] tmp: /tmp
> [powerbook.2-net:20821] pls:rsh: launching on node g4d003.3-net
> [powerbook.2-net:20821] pls:rsh: not oversubscribed -- setting 
> mpi_yield_when_idle to 0
> [powerbook.2-net:20821] pls:rsh: g4d003.