I 'll recompile it on the home directory to see how it works.

________________________________
From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
[r...@open-mpi.org]
Sent: 28 March 2015 23:13
To: Open MPI Users
Subject: Re: [OMPI users] Connection problem on Linux cluster

Doug is correct, and we usually suggest you build it under your own home 
directory to make it easier to cleanup at a later time.

Only thing I can suggest is talking to the sys admin some more about TCP 
connections between VMs on OpenStack and getting their help. Something is 
obviously blocking communications, but it is likely something only they can 
identify. Clouds tend to be finicky in that regard.

You could also try the standard network diagnostics to see if TCP is capable of 
getting thru.


On Mar 28, 2015, at 4:00 PM, Douglas L Reeder 
<d...@centurylink.net<mailto:d...@centurylink.net>> wrote:

Building as root is a bad idea. Try building it as a regular user, using sudo 
make install if necessary.

Doug Reeder
On Mar 28, 2015, at 4:53 PM, LOTFIFAR F. 
<foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote:

when you said --debug-enable is not activated, I installed it again to make 
sure. I have only one mpi installed on all VMs.

FYI: I have just tried mpich to see how does it works. it freezes for few 
minutes then comes back with the error complaining about the firewall!!!! By 
the way,  I have already have firewall disabled and iptable is set to allow all 
connections. I checked with system admin and there is no other firewall between 
the nodes.

here is the output of what you are asked:

ubuntu@fehg-node-0:~$ which mpirun
/usr/local/openmpi/bin/mpirun
ubuntu@fehg-node-0:~$ ompi_info
                 Package: Open MPI ubuntu@fehg-node-0 Distribution
                Open MPI: 1.6.5
   Open MPI SVN revision: r28673
   Open MPI release date: Jun 26, 2013
                Open RTE: 1.6.5
   Open RTE SVN revision: r28673
   Open RTE release date: Jun 26, 2013
                    OPAL: 1.6.5
       OPAL SVN revision: r28673
       OPAL release date: Jun 26, 2013
                 MPI API: 2.1
            Ident string: 1.6.5
                  Prefix: /usr/local/openmpi
 Configured architecture: i686-pc-linux-gnu
          Configure host: fehg-node-0
           Configured by: ubuntu
           Configured on: Sat Mar 28 20:19:28 UTC 2015
          Configure host: fehg-node-0
                Built by: root
                Built on: Sat Mar 28 20:30:18 UTC 2015
              Built host: fehg-node-0
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: no
      Fortran90 bindings: no
 Fortran90 bindings size: na
              C compiler: gcc
     C compiler absolute: /usr/bin/gcc
  C compiler family name: GNU
      C compiler version: 4.6.3
            C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
      Fortran77 compiler: none
  Fortran77 compiler abs: none
      Fortran90 compiler: none
  Fortran90 compiler abs: none
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: no
     Fortran90 profiling: no
          C++ exceptions: no
          Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no)
           Sparse Groups: no
  Internal debug support: yes
  MPI interface warnings: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
   Heterogeneous support: no
 mpirun default --prefix: no
         MPI I/O support: yes
       MPI_WTIME support: gettimeofday
     Symbol vis. support: yes
   Host topology support: yes
          MPI extensions: affinity example
   FT Checkpoint support: no (checkpoint thread: no)
     VampirTrace support: yes
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
           MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.6.5)
              MCA memory: linux (MCA v2.0, API v2.0, Component v1.6.5)
           MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.5)
               MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.6.5)
               MCA carto: file (MCA v2.0, API v2.0, Component v1.6.5)
               MCA shmem: mmap (MCA v2.0, API v2.0, Component v1.6.5)
               MCA shmem: posix (MCA v2.0, API v2.0, Component v1.6.5)
               MCA shmem: sysv (MCA v2.0, API v2.0, Component v1.6.5)
           MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.6.5)
           MCA maffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.5)
               MCA timer: linux (MCA v2.0, API v2.0, Component v1.6.5)
         MCA installdirs: env (MCA v2.0, API v2.0, Component v1.6.5)
         MCA installdirs: config (MCA v2.0, API v2.0, Component v1.6.5)
             MCA sysinfo: linux (MCA v2.0, API v2.0, Component v1.6.5)
               MCA hwloc: hwloc132 (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA dpm: orte (MCA v2.0, API v2.0, Component v1.6.5)
              MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.6.5)
           MCA allocator: basic (MCA v2.0, API v2.0, Component v1.6.5)
           MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.6.5)
                MCA coll: basic (MCA v2.0, API v2.0, Component v1.6.5)
                MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.6.5)
                MCA coll: inter (MCA v2.0, API v2.0, Component v1.6.5)
                MCA coll: self (MCA v2.0, API v2.0, Component v1.6.5)
                MCA coll: sm (MCA v2.0, API v2.0, Component v1.6.5)
                MCA coll: sync (MCA v2.0, API v2.0, Component v1.6.5)
                MCA coll: tuned (MCA v2.0, API v2.0, Component v1.6.5)
                  MCA io: romio (MCA v2.0, API v2.0, Component v1.6.5)
               MCA mpool: fake (MCA v2.0, API v2.0, Component v1.6.5)
               MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.6.5)
               MCA mpool: sm (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA pml: bfo (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA pml: csum (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA pml: v (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA bml: r2 (MCA v2.0, API v2.0, Component v1.6.5)
              MCA rcache: vma (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA btl: self (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA btl: sm (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA btl: tcp (MCA v2.0, API v2.0, Component v1.6.5)
                MCA topo: unity (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA osc: rdma (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA iof: hnp (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA iof: orted (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA iof: tool (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA oob: tcp (MCA v2.0, API v2.0, Component v1.6.5)
                MCA odls: default (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ras: cm (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ras: loadleveler (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ras: slurm (MCA v2.0, API v2.0, Component v1.6.5)
               MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.6.5)
               MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.6.5)
               MCA rmaps: resilient (MCA v2.0, API v2.0, Component v1.6.5)
               MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.6.5)
               MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.6.5)
               MCA rmaps: topo (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA rml: oob (MCA v2.0, API v2.0, Component v1.6.5)
              MCA routed: binomial (MCA v2.0, API v2.0, Component v1.6.5)
              MCA routed: cm (MCA v2.0, API v2.0, Component v1.6.5)
              MCA routed: direct (MCA v2.0, API v2.0, Component v1.6.5)
              MCA routed: linear (MCA v2.0, API v2.0, Component v1.6.5)
              MCA routed: radix (MCA v2.0, API v2.0, Component v1.6.5)
              MCA routed: slave (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA plm: rsh (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA plm: slurm (MCA v2.0, API v2.0, Component v1.6.5)
               MCA filem: rsh (MCA v2.0, API v2.0, Component v1.6.5)
              MCA errmgr: default (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ess: env (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ess: hnp (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ess: singleton (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ess: slave (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ess: slurm (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ess: slurmd (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ess: tool (MCA v2.0, API v2.0, Component v1.6.5)
             MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.6.5)
             MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.6.5)
             MCA grpcomm: hier (MCA v2.0, API v2.0, Component v1.6.5)
            MCA notifier: command (MCA v2.0, API v1.0, Component v1.6.5)
            MCA notifier: syslog (MCA v2.0, API v1.0, Component v1.6.5)


Regards,
Karos



________________________________
From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on 
behalf of Ralph Castain [r...@open-mpi.org<mailto:r...@open-mpi.org>]
Sent: 28 March 2015 22:04
To: Open MPI Users
Subject: Re: [OMPI users] Connection problem on Linux cluster

Something is clearly wrong. Most likely, you are not pointing at the OMPI 
install that you think you are - or you didn’t really configure it properly. 
Check the path by running “which mpirun” and ensure you are executing the one 
you expected. If so, then run “ompi_info” to see how it was configured and sent 
it to us.


On Mar 28, 2015, at 1:36 PM, LOTFIFAR F. 
<foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote:

surprisingly,  it is all that I get!! nothing else come after.  This is the 
same for openmpi-1.6.5.


________________________________
From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on 
behalf of Ralph Castain [r...@open-mpi.org<mailto:r...@open-mpi.org>]
Sent: 28 March 2015 20:12
To: Open MPI Users
Subject: Re: [OMPI users] Connection problem on Linux cluster

Did you configure —enable-debug? We aren’t seeing any of the debug output, so I 
suspect not.


On Mar 28, 2015, at 12:56 PM, LOTFIFAR F. 
<foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote:

I have done it and it is the results:

ubuntu@fehg-node-0:~$ mpirun -host fehg-node-7 -mca oob_base_verbose 100 -mca 
state_base_verbose 10 hostname
[fehg-node-0:30034] mca: base: components_open: Looking for oob components
[fehg-node-0:30034] mca: base: components_open: opening oob components
[fehg-node-0:30034] mca: base: components_open: found loaded component tcp
[fehg-node-0:30034] mca: base: components_open: component tcp register function 
successful
[fehg-node-0:30034] mca: base: components_open: component tcp open function 
successful
[fehg-node-7:31138] mca: base: components_open: Looking for oob components
[fehg-node-7:31138] mca: base: components_open: opening oob components
[fehg-node-7:31138] mca: base: components_open: found loaded component tcp
[fehg-node-7:31138] mca: base: components_open: component tcp register function 
successful
[fehg-node-7:31138] mca: base: components_open: component tcp open function 
successful

freeze ...

Regards

________________________________
From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on 
behalf of LOTFIFAR F. 
[foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>]
Sent: 28 March 2015 18:49
To: Open MPI Users
Subject: Re: [OMPI users] Connection problem on Linux cluster

fehg_node_1 and fehg-node-7 are the same. it is just a typo.

Correction: VM names are fehg-node-0 and fehg-node-7.


Regards,

________________________________
From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on 
behalf of Ralph Castain [r...@open-mpi.org<mailto:r...@open-mpi.org>]
Sent: 28 March 2015 18:23
To: Open MPI Users
Subject: Re: [OMPI users] Connection problem on Linux cluster

Just to be clear: do you have two physical nodes? Or just one physical node and 
you are running two VMs on it?

On Mar 28, 2015, at 10:51 AM, LOTFIFAR F. 
<foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote:

I have a floating IP for accessing nodes from outside of the cluster and 
internal ip addresses. I tried to run the jobs with both of them (both ip 
addresses) but it makes no difference.
I have just installed openmpi 1.6.5 to see how does this version works. In this 
case I get nothing and I have to press Crtl+c. not output or error is shown.


________________________________
From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on 
behalf of Ralph Castain [r...@open-mpi.org<mailto:r...@open-mpi.org>]
Sent: 28 March 2015 17:03
To: Open MPI Users
Subject: Re: [OMPI users] Connection problem on Linux cluster

You mentioned running this in a VM - is that IP address correct for getting 
across the VMs?


On Mar 28, 2015, at 8:38 AM, LOTFIFAR F. 
<foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote:

Hi ,

I am wondering how can I solve this problem.
System Spec:
1- Linux cluster with two nodes (master and slave) with Ubuntu 12.04 LTS 32bit.
2- openmpi 1.8.4

I do a simple test running on fehg_node_0:
> mpirun -host fehg_node_0,fehg_node_1 hello_world -mca oob_base_verbose 20

and I get the following error:

A process or daemon was unable to complete a TCP connection
to another process:
  Local host:    fehg-node-0
  Remote host:   10.104.5.40
This is usually caused by a firewall on the remote host. Please
check that any firewall (e.g., iptables) has been disabled and
try again.
------------------------------------------------------------
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).

Verbose:
1- I have full access to the VMs on the cluster and setup everything myself
2- Firewall and iptables are all disabled on the nodes
3- nodes can ssh to each other with  no problem
4- non-interactive bash calls works fine i.e. when I run ssh othernode env | 
grep PATH from both nodes, both PATH and LD_LIBRARY_PATH are set correctly
5- I have checked the posts, a similar problem reported for Solaris but I could 
not find a clue about mine.
6- run with --enable-orterun-prefix-by-default does not make any changes.
7-  I see orte is running on the other node when I check processes, but nothing 
happens after that and the error happens.

Regards,
Karos
_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/03/26555.php

_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/03/26557.php

_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/03/26562.php

_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/03/26564.php

_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/03/26566.php

_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/03/26567.php

Reply via email to