I 'll recompile it on the home directory to see how it works. ________________________________ From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain [r...@open-mpi.org] Sent: 28 March 2015 23:13 To: Open MPI Users Subject: Re: [OMPI users] Connection problem on Linux cluster
Doug is correct, and we usually suggest you build it under your own home directory to make it easier to cleanup at a later time. Only thing I can suggest is talking to the sys admin some more about TCP connections between VMs on OpenStack and getting their help. Something is obviously blocking communications, but it is likely something only they can identify. Clouds tend to be finicky in that regard. You could also try the standard network diagnostics to see if TCP is capable of getting thru. On Mar 28, 2015, at 4:00 PM, Douglas L Reeder <d...@centurylink.net<mailto:d...@centurylink.net>> wrote: Building as root is a bad idea. Try building it as a regular user, using sudo make install if necessary. Doug Reeder On Mar 28, 2015, at 4:53 PM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote: when you said --debug-enable is not activated, I installed it again to make sure. I have only one mpi installed on all VMs. FYI: I have just tried mpich to see how does it works. it freezes for few minutes then comes back with the error complaining about the firewall!!!! By the way, I have already have firewall disabled and iptable is set to allow all connections. I checked with system admin and there is no other firewall between the nodes. here is the output of what you are asked: ubuntu@fehg-node-0:~$ which mpirun /usr/local/openmpi/bin/mpirun ubuntu@fehg-node-0:~$ ompi_info Package: Open MPI ubuntu@fehg-node-0 Distribution Open MPI: 1.6.5 Open MPI SVN revision: r28673 Open MPI release date: Jun 26, 2013 Open RTE: 1.6.5 Open RTE SVN revision: r28673 Open RTE release date: Jun 26, 2013 OPAL: 1.6.5 OPAL SVN revision: r28673 OPAL release date: Jun 26, 2013 MPI API: 2.1 Ident string: 1.6.5 Prefix: /usr/local/openmpi Configured architecture: i686-pc-linux-gnu Configure host: fehg-node-0 Configured by: ubuntu Configured on: Sat Mar 28 20:19:28 UTC 2015 Configure host: fehg-node-0 Built by: root Built on: Sat Mar 28 20:30:18 UTC 2015 Built host: fehg-node-0 C bindings: yes C++ bindings: yes Fortran77 bindings: no Fortran90 bindings: no Fortran90 bindings size: na C compiler: gcc C compiler absolute: /usr/bin/gcc C compiler family name: GNU C compiler version: 4.6.3 C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: none Fortran77 compiler abs: none Fortran90 compiler: none Fortran90 compiler abs: none C profiling: yes C++ profiling: yes Fortran77 profiling: no Fortran90 profiling: no C++ exceptions: no Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no) Sparse Groups: no Internal debug support: yes MPI interface warnings: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: no mpirun default --prefix: no MPI I/O support: yes MPI_WTIME support: gettimeofday Symbol vis. support: yes Host topology support: yes MPI extensions: affinity example FT Checkpoint support: no (checkpoint thread: no) VampirTrace support: yes MPI_MAX_PROCESSOR_NAME: 256 MPI_MAX_ERROR_STRING: 256 MPI_MAX_OBJECT_NAME: 64 MPI_MAX_INFO_KEY: 36 MPI_MAX_INFO_VAL: 256 MPI_MAX_PORT_NAME: 1024 MPI_MAX_DATAREP_STRING: 128 MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.6.5) MCA memory: linux (MCA v2.0, API v2.0, Component v1.6.5) MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.5) MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.6.5) MCA carto: file (MCA v2.0, API v2.0, Component v1.6.5) MCA shmem: mmap (MCA v2.0, API v2.0, Component v1.6.5) MCA shmem: posix (MCA v2.0, API v2.0, Component v1.6.5) MCA shmem: sysv (MCA v2.0, API v2.0, Component v1.6.5) MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.6.5) MCA maffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.5) MCA timer: linux (MCA v2.0, API v2.0, Component v1.6.5) MCA installdirs: env (MCA v2.0, API v2.0, Component v1.6.5) MCA installdirs: config (MCA v2.0, API v2.0, Component v1.6.5) MCA sysinfo: linux (MCA v2.0, API v2.0, Component v1.6.5) MCA hwloc: hwloc132 (MCA v2.0, API v2.0, Component v1.6.5) MCA dpm: orte (MCA v2.0, API v2.0, Component v1.6.5) MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.6.5) MCA allocator: basic (MCA v2.0, API v2.0, Component v1.6.5) MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.6.5) MCA coll: basic (MCA v2.0, API v2.0, Component v1.6.5) MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.6.5) MCA coll: inter (MCA v2.0, API v2.0, Component v1.6.5) MCA coll: self (MCA v2.0, API v2.0, Component v1.6.5) MCA coll: sm (MCA v2.0, API v2.0, Component v1.6.5) MCA coll: sync (MCA v2.0, API v2.0, Component v1.6.5) MCA coll: tuned (MCA v2.0, API v2.0, Component v1.6.5) MCA io: romio (MCA v2.0, API v2.0, Component v1.6.5) MCA mpool: fake (MCA v2.0, API v2.0, Component v1.6.5) MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.6.5) MCA mpool: sm (MCA v2.0, API v2.0, Component v1.6.5) MCA pml: bfo (MCA v2.0, API v2.0, Component v1.6.5) MCA pml: csum (MCA v2.0, API v2.0, Component v1.6.5) MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.6.5) MCA pml: v (MCA v2.0, API v2.0, Component v1.6.5) MCA bml: r2 (MCA v2.0, API v2.0, Component v1.6.5) MCA rcache: vma (MCA v2.0, API v2.0, Component v1.6.5) MCA btl: self (MCA v2.0, API v2.0, Component v1.6.5) MCA btl: sm (MCA v2.0, API v2.0, Component v1.6.5) MCA btl: tcp (MCA v2.0, API v2.0, Component v1.6.5) MCA topo: unity (MCA v2.0, API v2.0, Component v1.6.5) MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.6.5) MCA osc: rdma (MCA v2.0, API v2.0, Component v1.6.5) MCA iof: hnp (MCA v2.0, API v2.0, Component v1.6.5) MCA iof: orted (MCA v2.0, API v2.0, Component v1.6.5) MCA iof: tool (MCA v2.0, API v2.0, Component v1.6.5) MCA oob: tcp (MCA v2.0, API v2.0, Component v1.6.5) MCA odls: default (MCA v2.0, API v2.0, Component v1.6.5) MCA ras: cm (MCA v2.0, API v2.0, Component v1.6.5) MCA ras: loadleveler (MCA v2.0, API v2.0, Component v1.6.5) MCA ras: slurm (MCA v2.0, API v2.0, Component v1.6.5) MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.6.5) MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.6.5) MCA rmaps: resilient (MCA v2.0, API v2.0, Component v1.6.5) MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.6.5) MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.6.5) MCA rmaps: topo (MCA v2.0, API v2.0, Component v1.6.5) MCA rml: oob (MCA v2.0, API v2.0, Component v1.6.5) MCA routed: binomial (MCA v2.0, API v2.0, Component v1.6.5) MCA routed: cm (MCA v2.0, API v2.0, Component v1.6.5) MCA routed: direct (MCA v2.0, API v2.0, Component v1.6.5) MCA routed: linear (MCA v2.0, API v2.0, Component v1.6.5) MCA routed: radix (MCA v2.0, API v2.0, Component v1.6.5) MCA routed: slave (MCA v2.0, API v2.0, Component v1.6.5) MCA plm: rsh (MCA v2.0, API v2.0, Component v1.6.5) MCA plm: slurm (MCA v2.0, API v2.0, Component v1.6.5) MCA filem: rsh (MCA v2.0, API v2.0, Component v1.6.5) MCA errmgr: default (MCA v2.0, API v2.0, Component v1.6.5) MCA ess: env (MCA v2.0, API v2.0, Component v1.6.5) MCA ess: hnp (MCA v2.0, API v2.0, Component v1.6.5) MCA ess: singleton (MCA v2.0, API v2.0, Component v1.6.5) MCA ess: slave (MCA v2.0, API v2.0, Component v1.6.5) MCA ess: slurm (MCA v2.0, API v2.0, Component v1.6.5) MCA ess: slurmd (MCA v2.0, API v2.0, Component v1.6.5) MCA ess: tool (MCA v2.0, API v2.0, Component v1.6.5) MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.6.5) MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.6.5) MCA grpcomm: hier (MCA v2.0, API v2.0, Component v1.6.5) MCA notifier: command (MCA v2.0, API v1.0, Component v1.6.5) MCA notifier: syslog (MCA v2.0, API v1.0, Component v1.6.5) Regards, Karos ________________________________ From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain [r...@open-mpi.org<mailto:r...@open-mpi.org>] Sent: 28 March 2015 22:04 To: Open MPI Users Subject: Re: [OMPI users] Connection problem on Linux cluster Something is clearly wrong. Most likely, you are not pointing at the OMPI install that you think you are - or you didn’t really configure it properly. Check the path by running “which mpirun” and ensure you are executing the one you expected. If so, then run “ompi_info” to see how it was configured and sent it to us. On Mar 28, 2015, at 1:36 PM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote: surprisingly, it is all that I get!! nothing else come after. This is the same for openmpi-1.6.5. ________________________________ From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain [r...@open-mpi.org<mailto:r...@open-mpi.org>] Sent: 28 March 2015 20:12 To: Open MPI Users Subject: Re: [OMPI users] Connection problem on Linux cluster Did you configure —enable-debug? We aren’t seeing any of the debug output, so I suspect not. On Mar 28, 2015, at 12:56 PM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote: I have done it and it is the results: ubuntu@fehg-node-0:~$ mpirun -host fehg-node-7 -mca oob_base_verbose 100 -mca state_base_verbose 10 hostname [fehg-node-0:30034] mca: base: components_open: Looking for oob components [fehg-node-0:30034] mca: base: components_open: opening oob components [fehg-node-0:30034] mca: base: components_open: found loaded component tcp [fehg-node-0:30034] mca: base: components_open: component tcp register function successful [fehg-node-0:30034] mca: base: components_open: component tcp open function successful [fehg-node-7:31138] mca: base: components_open: Looking for oob components [fehg-node-7:31138] mca: base: components_open: opening oob components [fehg-node-7:31138] mca: base: components_open: found loaded component tcp [fehg-node-7:31138] mca: base: components_open: component tcp register function successful [fehg-node-7:31138] mca: base: components_open: component tcp open function successful freeze ... Regards ________________________________ From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on behalf of LOTFIFAR F. [foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>] Sent: 28 March 2015 18:49 To: Open MPI Users Subject: Re: [OMPI users] Connection problem on Linux cluster fehg_node_1 and fehg-node-7 are the same. it is just a typo. Correction: VM names are fehg-node-0 and fehg-node-7. Regards, ________________________________ From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain [r...@open-mpi.org<mailto:r...@open-mpi.org>] Sent: 28 March 2015 18:23 To: Open MPI Users Subject: Re: [OMPI users] Connection problem on Linux cluster Just to be clear: do you have two physical nodes? Or just one physical node and you are running two VMs on it? On Mar 28, 2015, at 10:51 AM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote: I have a floating IP for accessing nodes from outside of the cluster and internal ip addresses. I tried to run the jobs with both of them (both ip addresses) but it makes no difference. I have just installed openmpi 1.6.5 to see how does this version works. In this case I get nothing and I have to press Crtl+c. not output or error is shown. ________________________________ From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain [r...@open-mpi.org<mailto:r...@open-mpi.org>] Sent: 28 March 2015 17:03 To: Open MPI Users Subject: Re: [OMPI users] Connection problem on Linux cluster You mentioned running this in a VM - is that IP address correct for getting across the VMs? On Mar 28, 2015, at 8:38 AM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote: Hi , I am wondering how can I solve this problem. System Spec: 1- Linux cluster with two nodes (master and slave) with Ubuntu 12.04 LTS 32bit. 2- openmpi 1.8.4 I do a simple test running on fehg_node_0: > mpirun -host fehg_node_0,fehg_node_1 hello_world -mca oob_base_verbose 20 and I get the following error: A process or daemon was unable to complete a TCP connection to another process: Local host: fehg-node-0 Remote host: 10.104.5.40 This is usually caused by a firewall on the remote host. Please check that any firewall (e.g., iptables) has been disabled and try again. ------------------------------------------------------------ -------------------------------------------------------------------------- ORTE was unable to reliably start one or more daemons. This usually is caused by: * not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or configure OMPI with --enable-orterun-prefix-by-default * lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities. * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to determine the correct location to use. * compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and consider using one of the contrib/platform definitions for your system type. * an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements). Verbose: 1- I have full access to the VMs on the cluster and setup everything myself 2- Firewall and iptables are all disabled on the nodes 3- nodes can ssh to each other with no problem 4- non-interactive bash calls works fine i.e. when I run ssh othernode env | grep PATH from both nodes, both PATH and LD_LIBRARY_PATH are set correctly 5- I have checked the posts, a similar problem reported for Solaris but I could not find a clue about mine. 6- run with --enable-orterun-prefix-by-default does not make any changes. 7- I see orte is running on the other node when I check processes, but nothing happens after that and the error happens. Regards, Karos _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/03/26555.php _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/03/26557.php _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/03/26562.php _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/03/26564.php _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/03/26566.php _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/03/26567.php