[OMPI users] Connection Errors: Socket is not connected (57) but works for a one messages to each place at first. Works on machine order.

2011-03-05 Thread atexannamedbob

Dear Open-mpi users,
Currently we are running on 4 imacs 10.5.8 all identical and all on the same 
network using MPI version 1.4.1.
We get an error that we cannot seem to find any help on. 
Sometimes we get the error Socket Connection (79)
[30451,1],1][btl_tcp_endpoint.c:298:mca_btl_tcp_endpoint_send_blocking] send() 
failed: Socket is not connected (57)
The strangest thing is the error only happens when we run with certain machines 
in a certain order.


ECHO $Path /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/texbin

mpicc -m64 -lpthread -w -lm -std="c99" inc/*.h lib/*.c -o dispatcher

The strange issues all dispatchers are able to send a one small message  to 
each other before this error occurs.
Does not work:
mpirun -H juhu,hama -n 2 dispatcher
mpirun -H hama,juhu -n 2 dispatcher
mpirun -H hama,tuvalu -n 2 dispatchermpirun -H juhu,tuvalu -n 2 
dispatcherWorks: 
mpirun -H tuvalu,juhu -n 2 dispatchermpirun -H tuvalu,hama -n 2 dispatcher
Dispatcher is a multithreaded application that sends messages to other 
dispatchers.


ifconfig output for machine 1 with the problem

lo0: flags=8049 mtu 16384
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
inet 127.0.0.1 netmask 0xff00 
inet6 ::1 prefixlen 128 
gif0: flags=8010 mtu 1280
stf0: flags=0<> mtu 1280
fw0: flags=8863 mtu 4078
lladdr 00:1f:f3:ff:fe:6e:5d:26 
media: autoselect  status: inactive
supported media: autoselect 
en1: flags=8823 mtu 1500
ether 00:1f:5b:c9:3b:8f 
media: autoselect () status: inactive
supported media: autoselect
en0: flags=8863 mtu 1500
inet 131.179.224.186 netmask 0xff00 broadcast 131.179.224.255
ether 00:1f:f3:59:d2:3d 
media: autoselect (100baseTX ) status: active
supported media: autoselect 10baseT/UTP  10baseT/UTP 
 10baseT/UTP  10baseT/UTP 
 100baseTX  100baseTX  
100baseTX  100baseTX  
1000baseT  1000baseT  1000baseT 
 none
vmnet8: flags=8863 mtu 1500
inet 172.16.181.1 netmask 0xff00 broadcast 172.16.181.255
ether 00:50:56:c0:00:08 
vmnet1: flags=8863 mtu 1500
inet 172.16.32.1 netmask 0xff00 broadcast 172.16.32.255
ether 00:50:56:c0:00:01 

ifconfig output for machine 2 with the problem



lo0: flags=8049 mtu 16384
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
inet 127.0.0.1 netmask 0xff00 
inet6 ::1 prefixlen 128 
gif0: flags=8010 mtu 1280
stf0: flags=0<> mtu 1280
fw0: flags=8863 mtu 4078
lladdr 00:1f:5b:ff:fe:20:ae:1e 
media: autoselect  status: inactive
supported media: autoselect 
en1: flags=8823 mtu 1500
ether 00:1f:5b:c9:10:1d 
media: autoselect () status: inactive
supported media: autoselect
en0: flags=8863 mtu 1500
inet6 fe80::21e:c2ff:fe1a:c673%en0 prefixlen 64 scopeid 0x6 
inet 131.179.224.185 netmask 0xff00 broadcast 131.179.224.255
ether 00:1e:c2:1a:c6:73 
media: autoselect (100baseTX ) status: active
supported media: autoselect 10baseT/UTP  10baseT/UTP 
 10baseT/UTP  10baseT/UTP 
 100baseTX  100baseTX  
100baseTX  100baseTX  
1000baseT  1000baseT  1000baseT 
 none
vboxnet0: flags=8842 mtu 1500
ether 0a:00:27:00:00:00 
vmnet1: flags=8863 mtu 1500
inet 192.168.138.1 netmask 0xff00 broadcast 192.168.138.255
ether 00:50:56:c0:00:01 
vmnet8: flags=8863 mtu 1500
inet 192.168.56.1 netmask 0xff00 broadcast 192.168.56.255
ether 00:50:56:c0:00:08 


Thanks!
Oren
   Package: Open MPI admin4810@juhu Distribution
Open MPI: 1.4.1
   Open MPI SVN revision: r22421
   Open MPI release date: Jan 14, 2010
Open RTE: 1.4.1
   Open RTE SVN revision: r22421
   Open RTE release date: Jan 14, 2010
OPAL: 1.4.1
   OPAL SVN revision: r22421
   OPAL release date: Jan 14, 2010
Ident string: 1.4.1
   MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.4.1)
   MCA paffinity: darwin (MCA v2.0, API v2.0, Component v1.4.1)
   MCA paffinity: test (MCA v2.0, API v2.0, Component v1.4.1)
   MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.4.1)
   MCA carto: file (MCA v2.0, API v2.0, Component v1.4.1)
   MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.1)
   MCA timer: darwin (MCA v2.0, API v2.0, Component v1.4.1)
 MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4.1)
 MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4.1)
 MCA dpm: orte (MCA v2.0, API v2.0, Component v1.4.1)
  MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.4.1)
   MCA allocator: basic (MCA v2.0, API v2.0, Component v1.4.1)
   MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.4.1)
MCA coll: basic (MCA v2.0, API v2.0, Component v1.4.1)
MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.4.1)
MCA coll: inter (MCA v2.0, API v2.0, Component v1.4.1)
   

Re: [OMPI users] Connection Errors: Socket is not connected (57) but works for a one messages to each place at first. Works on machine order.

2011-03-29 Thread atexannamedbob

This was fixed with 
--mca btl_tcp_if_exclude lo,eth0,vmnet0,vmnet1,vmnet8.
The machines were trying to connect through virtual connections on the machine.
Thanks!
 
 


From: atexannamed...@hotmail.com
To: us...@open-mpi.org
List-Post: users@lists.open-mpi.org
Date: Fri, 4 Mar 2011 23:43:42 -0600
Subject: [OMPI users] Connection Errors: Socket is not connected (57) but works 
for a one messages to each place at first. Works on machine order.




Dear Open-mpi users,
Currently we are running on 4 imacs 10.5.8 all identical and all on the same 
network using MPI version 1.4.1.
We get an error that we cannot seem to find any help on. 
Sometimes we get the error Socket Connection (79)
[30451,1],1][btl_tcp_endpoint.c:298:mca_btl_tcp_endpoint_send_blocking] send() 
failed: Socket is not connected (57)
The strangest thing is the error only happens when we run with certain machines 
in a certain order.


ECHO $Path /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/texbin

mpicc -m64 -lpthread -w -lm -std="c99" inc/*.h lib/*.c -o dispatcher

The strange issues all dispatchers are able to send a one small message  to 
each other before this error occurs.
Does not work:
mpirun -H juhu,hama -n 2 dispatcher
mpirun -H hama,juhu -n 2 dispatcher
mpirun -H hama,tuvalu -n 2 dispatcher
mpirun -H juhu,tuvalu -n 2 dispatcher
Works: 
mpirun -H tuvalu,juhu -n 2 dispatcher
mpirun -H tuvalu,hama -n 2 dispatcher
Dispatcher is a multithreaded application that sends messages to other 
dispatchers.


ifconfig output for machine 1 with the problem

lo0: flags=8049 mtu 16384
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
inet 127.0.0.1 netmask 0xff00 
inet6 ::1 prefixlen 128 
gif0: flags=8010 mtu 1280
stf0: flags=0<> mtu 1280
fw0: flags=8863 mtu 4078
lladdr 00:1f:f3:ff:fe:6e:5d:26 
media: autoselect  status: inactive
supported media: autoselect 
en1: flags=8823 mtu 1500
ether 00:1f:5b:c9:3b:8f 
media: autoselect () status: inactive
supported media: autoselect
en0: flags=8863 mtu 1500
inet 131.179.224.186 netmask 0xff00 broadcast 131.179.224.255
ether 00:1f:f3:59:d2:3d 
media: autoselect (100baseTX ) status: active
supported media: autoselect 10baseT/UTP  10baseT/UTP 
 10baseT/UTP  10baseT/UTP 
 100baseTX  100baseTX  
100baseTX  100baseTX  
1000baseT  1000baseT  1000baseT 
 none
vmnet8: flags=8863 mtu 1500
inet 172.16.181.1 netmask 0xff00 broadcast 172.16.181.255
ether 00:50:56:c0:00:08 
vmnet1: flags=8863 mtu 1500
inet 172.16.32.1 netmask 0xff00 broadcast 172.16.32.255
ether 00:50:56:c0:00:01 

ifconfig output for machine 2 with the problem


lo0: flags=8049 mtu 16384
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
inet 127.0.0.1 netmask 0xff00 
inet6 ::1 prefixlen 128 
gif0: flags=8010 mtu 1280
stf0: flags=0<> mtu 1280
fw0: flags=8863 mtu 4078
lladdr 00:1f:5b:ff:fe:20:ae:1e 
media: autoselect  status: inactive
supported media: autoselect 
en1: flags=8823 mtu 1500
ether 00:1f:5b:c9:10:1d 
media: autoselect () status: inactive
supported media: autoselect
en0: flags=8863 mtu 1500
inet6 fe80::21e:c2ff:fe1a:c673%en0 prefixlen 64 scopeid 0x6 
inet 131.179.224.185 netmask 0xff00 broadcast 131.179.224.255
ether 00:1e:c2:1a:c6:73 
media: autoselect (100baseTX ) status: active
supported media: autoselect 10baseT/UTP  10baseT/UTP 
 10baseT/UTP  10baseT/UTP 
 100baseTX  100baseTX  
100baseTX  100baseTX  
1000baseT  1000baseT  1000baseT 
 none
vboxnet0: flags=8842 mtu 1500
ether 0a:00:27:00:00:00 
vmnet1: flags=8863 mtu 1500
inet 192.168.138.1 netmask 0xff00 broadcast 192.168.138.255
ether 00:50:56:c0:00:01 
vmnet8: flags=8863 mtu 1500
inet 192.168.56.1 netmask 0xff00 broadcast 192.168.56.255
ether 00:50:56:c0:00:08 


Thanks!
Oren

___ users mailing list 
us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users