On Jun 11, 2012, at 11:15 AM, BOUVIER Benjamin wrote:
> Thanks for your hints Jeff.
> I've just tried without any firewalls on involved machines, but the issue
> remains.
>
> # /etc/init.d/ip6tables status
> ip6tables: Firewall is not running.
> # /etc/init.d/iptables status
> iptables: Firewall is not running.
Ok.
> The machines have the host names "node1", "node2" and "node3".
> I launch the basic program on one machine, asking node1 and node2 to be
> hosts. Typing `netstat -a | grep node1` from node2 shows me that node1 and
> node2 are connected by tcp, as the connection is marked as ESTABLISHED. I
> have the same thing when I do `netstat -a | grep node2` from node1. However,
> the program keeps blocking.
I'm not entirely clear which combinations are working and which are not. Can
you specify which ones are working? You might want to try the ring_c.c program
in the OMPI examples/ directory -- it's a trivial "send a message around in a
ring" program that will scale up to >=2 processes.
- on node1, "mpirun --host node1,node2 ring_c"
- on node1, "mpirun --host node1,node3 ring_c"
- on node1, "mpirun --host node2,node3 ring_c"
- on node1, "mpirun --host node1,node2,node3 ring_c"
Repeat all 4 from node2.
> What else could provoke that failure ?
> --
> Benjamin BOUVIER
>
> ________________________________________
> To start, I would ensure that all firewalling (e.g., iptables) is disabled
> on all machines involved.
>
> On Jun 11, 2012, at 10:16 AM, BOUVIER Benjamin wrote:
>
>> Hi,
>>
>>> I'd guess that running net pipe with 3 procs may be undefined.
>>
>> It is indeed undefined. Running the net pipe program locally with 3
>> processors blocks, on my computer.
>>
>> This issue is especially weird as there is no problem for running the
>> example program on network with MPICH2 implementation, for 2 processes.
>>
>> However, with MPICH2, it fails with 3 processes and blocks also on connect
>> ("Connection refused"), which could indicate that it's actually a network
>> issue, with both MPICH2 and OMPI. I don't know how many connections OMPI use
>> to send the data in the example program, but with the assumption that it
>> tries to open 2 connections (while for the same program, MPICH2 only uses
>> one connection, which is another hypothesis), maybe the number of
>> connections is the right way to look for. I'll ask MPICH2 users on their
>> mailing list, so as to get their opinion about it.
>>
>> Now that I know the program doesn't work both with OMPI and MPICH2
>> implementations, I guess it's not dependant of MPI implementation.
>>
>> If you have any ideas or comments, I would be pleased to hear them.
>>
>> --
>> Benjamin Bouvier
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> [email protected]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
[email protected]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/