On Jun 11, 2012, at 11:15 AM, BOUVIER Benjamin wrote:

> Thanks for your hints Jeff.
> I've just tried without any firewalls on involved machines, but the issue 
> remains.
> 
> # /etc/init.d/ip6tables status
> ip6tables: Firewall is not running.
> # /etc/init.d/iptables status
> iptables: Firewall is not running.

Ok.

> The machines have the host names "node1", "node2" and "node3".
> I launch the basic program on one machine, asking node1 and node2 to be 
> hosts. Typing `netstat -a | grep node1` from node2 shows me that node1 and 
> node2 are connected by tcp, as the connection is marked as ESTABLISHED. I 
> have the same thing when I do `netstat -a | grep node2` from node1. However, 
> the program keeps blocking.

I'm not entirely clear which combinations are working and which are not.  Can 
you specify which ones are working?  You might want to try the ring_c.c program 
in the OMPI examples/ directory -- it's a trivial "send a message around in a 
ring" program that will scale up to >=2 processes.

- on node1, "mpirun --host node1,node2 ring_c"

- on node1, "mpirun --host node1,node3 ring_c"

- on node1, "mpirun --host node2,node3 ring_c"

- on node1, "mpirun --host node1,node2,node3 ring_c"

Repeat all 4 from node2.


> What else could provoke that failure ?
> --
> Benjamin BOUVIER 
> 
> ________________________________________
> To start, I would ensure that all firewalling  (e.g., iptables) is disabled 
> on all machines involved.
> 
> On Jun 11, 2012, at 10:16 AM, BOUVIER Benjamin wrote:
> 
>> Hi,
>> 
>>> I'd guess that running net pipe with 3 procs may be undefined.
>> 
>> It is indeed undefined. Running the net pipe program locally with 3 
>> processors blocks, on my computer.
>> 
>> This issue is especially weird as there is no problem for running the 
>> example program on network with MPICH2 implementation, for 2 processes.
>> 
>> However, with MPICH2, it fails with 3 processes and blocks also on connect 
>> ("Connection refused"), which could indicate that it's actually a network 
>> issue, with both MPICH2 and OMPI. I don't know how many connections OMPI use 
>> to send the data in the example program, but with the assumption that it 
>> tries to open 2 connections (while for the same program, MPICH2 only uses 
>> one connection, which is another hypothesis), maybe the number of 
>> connections is the right way to look for. I'll ask MPICH2 users on their 
>> mailing list, so as to get their opinion about it.
>> 
>> Now that I know the program doesn't work both with OMPI and MPICH2 
>> implementations, I guess it's not dependant of MPI implementation.
>> 
>> If you have any ideas or comments, I would be pleased to hear them.
>> 
>> --
>> Benjamin Bouvier
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to