You mentioned running this in a VM - is that IP address correct for getting 
across the VMs?


> On Mar 28, 2015, at 8:38 AM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk> wrote:
> 
> Hi , 
> 
> I am wondering how can I solve this problem. 
> System Spec:
> 1- Linux cluster with two nodes (master and slave) with Ubuntu 12.04 LTS 
> 32bit.
> 2- openmpi 1.8.4
> 
> I do a simple test running on fehg_node_0:
> > mpirun -host fehg_node_0,fehg_node_1 hello_world -mca oob_base_verbose 20
> 
> and I get the following error:
> 
> A process or daemon was unable to complete a TCP connection
> to another process:
>   Local host:    fehg-node-0
>   Remote host:   10.104.5.40
> This is usually caused by a firewall on the remote host. Please
> check that any firewall (e.g., iptables) has been disabled and
> try again.
> ------------------------------------------------------------
> --------------------------------------------------------------------------
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
> 
> * not finding the required libraries and/or binaries on
>   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>   settings, or configure OMPI with --enable-orterun-prefix-by-default
> 
> * lack of authority to execute on one or more specified nodes.
>   Please verify your allocation and authorities.
> 
> * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
>   Please check with your sys admin to determine the correct location to use.
> 
> *  compilation of the orted with dynamic libraries when static are required
>   (e.g., on Cray). Please check your configure cmd line and consider using
>   one of the contrib/platform definitions for your system type.
> 
> * an inability to create a connection back to mpirun due to a
>   lack of common network interfaces and/or no route found between
>   them. Please check network connectivity (including firewalls
>   and network routing requirements).
> 
> Verbose:
> 1- I have full access to the VMs on the cluster and setup everything myself
> 2- Firewall and iptables are all disabled on the nodes
> 3- nodes can ssh to each other with  no problem
> 4- non-interactive bash calls works fine i.e. when I run ssh othernode env | 
> grep PATH from both nodes, both PATH and LD_LIBRARY_PATH are set correctly
> 5- I have checked the posts, a similar problem reported for Solaris but I 
> could not find a clue about mine. 
> 6- run with --enable-orterun-prefix-by-default does not make any changes.
> 7-  I see orte is running on the other node when I check processes, but 
> nothing happens after that and the error happens.
> 
> Regards,
> Karos
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/03/26555.php 
> <http://www.open-mpi.org/community/lists/users/2015/03/26555.php>

Reply via email to