Thanks for the help. I've replied below. --- "G.O." <gurhan.o...@gmail.com> wrote:
> 1- Check to make sure that there are no firewalls blocking > traffic between the nodes. There is no firewall in-between the nodes. If I run jobs directly via ssh, e.g. "ssh node4 env" they work. > 2 - Check to make sure that all nodes have the openmpi installed > and have the very same executable you are trying to run on the same > path, have all permissions correctly. Yes, they are all installed to /usr/local , the permissions are the same, and if I just invoke mpirun on an individual node by logging into it, it works. In fact, even commands like "ssh node4 mpirun" (just to get the mpirun help banner) work. > 3- Check to make sure that all nodes have the same interface, > i.e. eth0 . They all do have the same interfaces. In my configureation, eth1 is the interface that corresponds to the cluster IP network. I have tried using "--mca btl_tcp_if_include eth1" but it seems to make no difference. > That's all i can think of for very quick checks for now. Hope it's > one of this. Thank you very much, but unfortunately it isn't any of these, as far as I can tell. ____________________________________________________________________________________ Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7