when you said --debug-enable is not activated, I installed it again to make sure. I have only one mpi installed on all VMs.
FYI: I have just tried mpich to see how does it works. it freezes for few minutes then comes back with the error complaining about the firewall!!!! By the way, I have already have firewall disabled and iptable is set to allow all connections. I checked with system admin and there is no other firewall between the nodes. here is the output of what you are asked: ubuntu@fehg-node-0:~$ which mpirun /usr/local/openmpi/bin/mpirun ubuntu@fehg-node-0:~$ ompi_info Package: Open MPI ubuntu@fehg-node-0 Distribution Open MPI: 1.6.5 Open MPI SVN revision: r28673 Open MPI release date: Jun 26, 2013 Open RTE: 1.6.5 Open RTE SVN revision: r28673 Open RTE release date: Jun 26, 2013 OPAL: 1.6.5 OPAL SVN revision: r28673 OPAL release date: Jun 26, 2013 MPI API: 2.1 Ident string: 1.6.5 Prefix: /usr/local/openmpi Configured architecture: i686-pc-linux-gnu Configure host: fehg-node-0 Configured by: ubuntu Configured on: Sat Mar 28 20:19:28 UTC 2015 Configure host: fehg-node-0 Built by: root Built on: Sat Mar 28 20:30:18 UTC 2015 Built host: fehg-node-0 C bindings: yes C++ bindings: yes Fortran77 bindings: no Fortran90 bindings: no Fortran90 bindings size: na C compiler: gcc C compiler absolute: /usr/bin/gcc C compiler family name: GNU C compiler version: 4.6.3 C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: none Fortran77 compiler abs: none Fortran90 compiler: none Fortran90 compiler abs: none C profiling: yes C++ profiling: yes Fortran77 profiling: no Fortran90 profiling: no C++ exceptions: no Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no) Sparse Groups: no Internal debug support: yes MPI interface warnings: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: no mpirun default --prefix: no MPI I/O support: yes MPI_WTIME support: gettimeofday Symbol vis. support: yes Host topology support: yes MPI extensions: affinity example FT Checkpoint support: no (checkpoint thread: no) VampirTrace support: yes MPI_MAX_PROCESSOR_NAME: 256 MPI_MAX_ERROR_STRING: 256 MPI_MAX_OBJECT_NAME: 64 MPI_MAX_INFO_KEY: 36 MPI_MAX_INFO_VAL: 256 MPI_MAX_PORT_NAME: 1024 MPI_MAX_DATAREP_STRING: 128 MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.6.5) MCA memory: linux (MCA v2.0, API v2.0, Component v1.6.5) MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.5) MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.6.5) MCA carto: file (MCA v2.0, API v2.0, Component v1.6.5) MCA shmem: mmap (MCA v2.0, API v2.0, Component v1.6.5) MCA shmem: posix (MCA v2.0, API v2.0, Component v1.6.5) MCA shmem: sysv (MCA v2.0, API v2.0, Component v1.6.5) MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.6.5) MCA maffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.5) MCA timer: linux (MCA v2.0, API v2.0, Component v1.6.5) MCA installdirs: env (MCA v2.0, API v2.0, Component v1.6.5) MCA installdirs: config (MCA v2.0, API v2.0, Component v1.6.5) MCA sysinfo: linux (MCA v2.0, API v2.0, Component v1.6.5) MCA hwloc: hwloc132 (MCA v2.0, API v2.0, Component v1.6.5) MCA dpm: orte (MCA v2.0, API v2.0, Component v1.6.5) MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.6.5) MCA allocator: basic (MCA v2.0, API v2.0, Component v1.6.5) MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.6.5) MCA coll: basic (MCA v2.0, API v2.0, Component v1.6.5) MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.6.5) MCA coll: inter (MCA v2.0, API v2.0, Component v1.6.5) MCA coll: self (MCA v2.0, API v2.0, Component v1.6.5) MCA coll: sm (MCA v2.0, API v2.0, Component v1.6.5) MCA coll: sync (MCA v2.0, API v2.0, Component v1.6.5) MCA coll: tuned (MCA v2.0, API v2.0, Component v1.6.5) MCA io: romio (MCA v2.0, API v2.0, Component v1.6.5) MCA mpool: fake (MCA v2.0, API v2.0, Component v1.6.5) MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.6.5) MCA mpool: sm (MCA v2.0, API v2.0, Component v1.6.5) MCA pml: bfo (MCA v2.0, API v2.0, Component v1.6.5) MCA pml: csum (MCA v2.0, API v2.0, Component v1.6.5) MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.6.5) MCA pml: v (MCA v2.0, API v2.0, Component v1.6.5) MCA bml: r2 (MCA v2.0, API v2.0, Component v1.6.5) MCA rcache: vma (MCA v2.0, API v2.0, Component v1.6.5) MCA btl: self (MCA v2.0, API v2.0, Component v1.6.5) MCA btl: sm (MCA v2.0, API v2.0, Component v1.6.5) MCA btl: tcp (MCA v2.0, API v2.0, Component v1.6.5) MCA topo: unity (MCA v2.0, API v2.0, Component v1.6.5) MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.6.5) MCA osc: rdma (MCA v2.0, API v2.0, Component v1.6.5) MCA iof: hnp (MCA v2.0, API v2.0, Component v1.6.5) MCA iof: orted (MCA v2.0, API v2.0, Component v1.6.5) MCA iof: tool (MCA v2.0, API v2.0, Component v1.6.5) MCA oob: tcp (MCA v2.0, API v2.0, Component v1.6.5) MCA odls: default (MCA v2.0, API v2.0, Component v1.6.5) MCA ras: cm (MCA v2.0, API v2.0, Component v1.6.5) MCA ras: loadleveler (MCA v2.0, API v2.0, Component v1.6.5) MCA ras: slurm (MCA v2.0, API v2.0, Component v1.6.5) MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.6.5) MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.6.5) MCA rmaps: resilient (MCA v2.0, API v2.0, Component v1.6.5) MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.6.5) MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.6.5) MCA rmaps: topo (MCA v2.0, API v2.0, Component v1.6.5) MCA rml: oob (MCA v2.0, API v2.0, Component v1.6.5) MCA routed: binomial (MCA v2.0, API v2.0, Component v1.6.5) MCA routed: cm (MCA v2.0, API v2.0, Component v1.6.5) MCA routed: direct (MCA v2.0, API v2.0, Component v1.6.5) MCA routed: linear (MCA v2.0, API v2.0, Component v1.6.5) MCA routed: radix (MCA v2.0, API v2.0, Component v1.6.5) MCA routed: slave (MCA v2.0, API v2.0, Component v1.6.5) MCA plm: rsh (MCA v2.0, API v2.0, Component v1.6.5) MCA plm: slurm (MCA v2.0, API v2.0, Component v1.6.5) MCA filem: rsh (MCA v2.0, API v2.0, Component v1.6.5) MCA errmgr: default (MCA v2.0, API v2.0, Component v1.6.5) MCA ess: env (MCA v2.0, API v2.0, Component v1.6.5) MCA ess: hnp (MCA v2.0, API v2.0, Component v1.6.5) MCA ess: singleton (MCA v2.0, API v2.0, Component v1.6.5) MCA ess: slave (MCA v2.0, API v2.0, Component v1.6.5) MCA ess: slurm (MCA v2.0, API v2.0, Component v1.6.5) MCA ess: slurmd (MCA v2.0, API v2.0, Component v1.6.5) MCA ess: tool (MCA v2.0, API v2.0, Component v1.6.5) MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.6.5) MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.6.5) MCA grpcomm: hier (MCA v2.0, API v2.0, Component v1.6.5) MCA notifier: command (MCA v2.0, API v1.0, Component v1.6.5) MCA notifier: syslog (MCA v2.0, API v1.0, Component v1.6.5) Regards, Karos ________________________________ From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain [r...@open-mpi.org] Sent: 28 March 2015 22:04 To: Open MPI Users Subject: Re: [OMPI users] Connection problem on Linux cluster Something is clearly wrong. Most likely, you are not pointing at the OMPI install that you think you are - or you didn’t really configure it properly. Check the path by running “which mpirun” and ensure you are executing the one you expected. If so, then run “ompi_info” to see how it was configured and sent it to us. On Mar 28, 2015, at 1:36 PM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote: surprisingly, it is all that I get!! nothing else come after. This is the same for openmpi-1.6.5. ________________________________ From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain [r...@open-mpi.org<mailto:r...@open-mpi.org>] Sent: 28 March 2015 20:12 To: Open MPI Users Subject: Re: [OMPI users] Connection problem on Linux cluster Did you configure —enable-debug? We aren’t seeing any of the debug output, so I suspect not. On Mar 28, 2015, at 12:56 PM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote: I have done it and it is the results: ubuntu@fehg-node-0:~$ mpirun -host fehg-node-7 -mca oob_base_verbose 100 -mca state_base_verbose 10 hostname [fehg-node-0:30034] mca: base: components_open: Looking for oob components [fehg-node-0:30034] mca: base: components_open: opening oob components [fehg-node-0:30034] mca: base: components_open: found loaded component tcp [fehg-node-0:30034] mca: base: components_open: component tcp register function successful [fehg-node-0:30034] mca: base: components_open: component tcp open function successful [fehg-node-7:31138] mca: base: components_open: Looking for oob components [fehg-node-7:31138] mca: base: components_open: opening oob components [fehg-node-7:31138] mca: base: components_open: found loaded component tcp [fehg-node-7:31138] mca: base: components_open: component tcp register function successful [fehg-node-7:31138] mca: base: components_open: component tcp open function successful freeze ... Regards ________________________________ From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on behalf of LOTFIFAR F. [foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>] Sent: 28 March 2015 18:49 To: Open MPI Users Subject: Re: [OMPI users] Connection problem on Linux cluster fehg_node_1 and fehg-node-7 are the same. it is just a typo. Correction: VM names are fehg-node-0 and fehg-node-7. Regards, ________________________________ From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain [r...@open-mpi.org<mailto:r...@open-mpi.org>] Sent: 28 March 2015 18:23 To: Open MPI Users Subject: Re: [OMPI users] Connection problem on Linux cluster Just to be clear: do you have two physical nodes? Or just one physical node and you are running two VMs on it? On Mar 28, 2015, at 10:51 AM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote: I have a floating IP for accessing nodes from outside of the cluster and internal ip addresses. I tried to run the jobs with both of them (both ip addresses) but it makes no difference. I have just installed openmpi 1.6.5 to see how does this version works. In this case I get nothing and I have to press Crtl+c. not output or error is shown. ________________________________ From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain [r...@open-mpi.org<mailto:r...@open-mpi.org>] Sent: 28 March 2015 17:03 To: Open MPI Users Subject: Re: [OMPI users] Connection problem on Linux cluster You mentioned running this in a VM - is that IP address correct for getting across the VMs? On Mar 28, 2015, at 8:38 AM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote: Hi , I am wondering how can I solve this problem. System Spec: 1- Linux cluster with two nodes (master and slave) with Ubuntu 12.04 LTS 32bit. 2- openmpi 1.8.4 I do a simple test running on fehg_node_0: > mpirun -host fehg_node_0,fehg_node_1 hello_world -mca oob_base_verbose 20 and I get the following error: A process or daemon was unable to complete a TCP connection to another process: Local host: fehg-node-0 Remote host: 10.104.5.40 This is usually caused by a firewall on the remote host. Please check that any firewall (e.g., iptables) has been disabled and try again. ------------------------------------------------------------ -------------------------------------------------------------------------- ORTE was unable to reliably start one or more daemons. This usually is caused by: * not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or configure OMPI with --enable-orterun-prefix-by-default * lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities. * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to determine the correct location to use. * compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and consider using one of the contrib/platform definitions for your system type. * an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements). Verbose: 1- I have full access to the VMs on the cluster and setup everything myself 2- Firewall and iptables are all disabled on the nodes 3- nodes can ssh to each other with no problem 4- non-interactive bash calls works fine i.e. when I run ssh othernode env | grep PATH from both nodes, both PATH and LD_LIBRARY_PATH are set correctly 5- I have checked the posts, a similar problem reported for Solaris but I could not find a clue about mine. 6- run with --enable-orterun-prefix-by-default does not make any changes. 7- I see orte is running on the other node when I check processes, but nothing happens after that and the error happens. Regards, Karos _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/03/26555.php _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/03/26557.php _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/03/26562.php _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/03/26564.php