Thanks for the help.  I've replied below.

--- "G.O." <gurhan.o...@gmail.com> wrote:

>     1- Check to make sure that there are no firewalls blocking
> traffic between the nodes.

There is no firewall in-between the nodes.  If I run jobs directly via
ssh, e.g. "ssh node4 env" they work.

>     2 - Check to make sure that all nodes have the openmpi installed
> and have the very same executable you are trying to run on the same
> path, have all permissions correctly.

Yes, they are all installed to /usr/local , the permissions are the
same, and if I just invoke mpirun on an individual node by logging into
it, it works.  In fact, even commands like "ssh node4 mpirun" (just to
get the mpirun help banner) work.

>     3- Check to make sure that all nodes have the same interface,
> i.e. eth0 .

They all do have the same interfaces.  In my configureation, eth1 is
the interface that corresponds to the cluster IP network.  I have tried
using "--mca btl_tcp_if_include eth1" but it seems to make no
difference.

>    That's all i can think of for very quick checks for now. Hope it's
> one of this.

Thank you very much, but unfortunately it isn't any of these, as far as
I can tell.



      
____________________________________________________________________________________
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel 
and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 

Reply via email to