Just to be clear: do you have two physical nodes? Or just one physical node and you are running two VMs on it?
> On Mar 28, 2015, at 10:51 AM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk> wrote: > > I have a floating IP for accessing nodes from outside of the cluster and > internal ip addresses. I tried to run the jobs with both of them (both ip > addresses) but it makes no difference. > I have just installed openmpi 1.6.5 to see how does this version works. In > this case I get nothing and I have to press Crtl+c. not output or error is > shown. > > > From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain > [r...@open-mpi.org] > Sent: 28 March 2015 17:03 > To: Open MPI Users > Subject: Re: [OMPI users] Connection problem on Linux cluster > > You mentioned running this in a VM - is that IP address correct for getting > across the VMs? > > >> On Mar 28, 2015, at 8:38 AM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk >> <mailto:foad.lotfi...@durham.ac.uk>> wrote: >> >> Hi , >> >> I am wondering how can I solve this problem. >> System Spec: >> 1- Linux cluster with two nodes (master and slave) with Ubuntu 12.04 LTS >> 32bit. >> 2- openmpi 1.8.4 >> >> I do a simple test running on fehg_node_0: >> > mpirun -host fehg_node_0,fehg_node_1 hello_world -mca oob_base_verbose 20 >> >> and I get the following error: >> >> A process or daemon was unable to complete a TCP connection >> to another process: >> Local host: fehg-node-0 >> Remote host: 10.104.5.40 >> This is usually caused by a firewall on the remote host. Please >> check that any firewall (e.g., iptables) has been disabled and >> try again. >> ------------------------------------------------------------ >> -------------------------------------------------------------------------- >> ORTE was unable to reliably start one or more daemons. >> This usually is caused by: >> >> * not finding the required libraries and/or binaries on >> one or more nodes. Please check your PATH and LD_LIBRARY_PATH >> settings, or configure OMPI with --enable-orterun-prefix-by-default >> >> * lack of authority to execute on one or more specified nodes. >> Please verify your allocation and authorities. >> >> * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). >> Please check with your sys admin to determine the correct location to use. >> >> * compilation of the orted with dynamic libraries when static are required >> (e.g., on Cray). Please check your configure cmd line and consider using >> one of the contrib/platform definitions for your system type. >> >> * an inability to create a connection back to mpirun due to a >> lack of common network interfaces and/or no route found between >> them. Please check network connectivity (including firewalls >> and network routing requirements). >> >> Verbose: >> 1- I have full access to the VMs on the cluster and setup everything myself >> 2- Firewall and iptables are all disabled on the nodes >> 3- nodes can ssh to each other with no problem >> 4- non-interactive bash calls works fine i.e. when I run ssh othernode env | >> grep PATH from both nodes, both PATH and LD_LIBRARY_PATH are set correctly >> 5- I have checked the posts, a similar problem reported for Solaris but I >> could not find a clue about mine. >> 6- run with --enable-orterun-prefix-by-default does not make any changes. >> 7- I see orte is running on the other node when I check processes, but >> nothing happens after that and the error happens. >> >> Regards, >> Karos >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/03/26555.php >> <http://www.open-mpi.org/community/lists/users/2015/03/26555.php> > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/03/26557.php > <http://www.open-mpi.org/community/lists/users/2015/03/26557.php>