when you said --debug-enable is not activated, I installed it again to make 
sure. I have only one mpi installed on all VMs.

FYI: I have just tried mpich to see how does it works. it freezes for few 
minutes then comes back with the error complaining about the firewall!!!! By 
the way,  I have already have firewall disabled and iptable is set to allow all 
connections. I checked with system admin and there is no other firewall between 
the nodes.

here is the output of what you are asked:

ubuntu@fehg-node-0:~$ which mpirun
/usr/local/openmpi/bin/mpirun
ubuntu@fehg-node-0:~$ ompi_info
                 Package: Open MPI ubuntu@fehg-node-0 Distribution
                Open MPI: 1.6.5
   Open MPI SVN revision: r28673
   Open MPI release date: Jun 26, 2013
                Open RTE: 1.6.5
   Open RTE SVN revision: r28673
   Open RTE release date: Jun 26, 2013
                    OPAL: 1.6.5
       OPAL SVN revision: r28673
       OPAL release date: Jun 26, 2013
                 MPI API: 2.1
            Ident string: 1.6.5
                  Prefix: /usr/local/openmpi
 Configured architecture: i686-pc-linux-gnu
          Configure host: fehg-node-0
           Configured by: ubuntu
           Configured on: Sat Mar 28 20:19:28 UTC 2015
          Configure host: fehg-node-0
                Built by: root
                Built on: Sat Mar 28 20:30:18 UTC 2015
              Built host: fehg-node-0
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: no
      Fortran90 bindings: no
 Fortran90 bindings size: na
              C compiler: gcc
     C compiler absolute: /usr/bin/gcc
  C compiler family name: GNU
      C compiler version: 4.6.3
            C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
      Fortran77 compiler: none
  Fortran77 compiler abs: none
      Fortran90 compiler: none
  Fortran90 compiler abs: none
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: no
     Fortran90 profiling: no
          C++ exceptions: no
          Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no)
           Sparse Groups: no
  Internal debug support: yes
  MPI interface warnings: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
   Heterogeneous support: no
 mpirun default --prefix: no
         MPI I/O support: yes
       MPI_WTIME support: gettimeofday
     Symbol vis. support: yes
   Host topology support: yes
          MPI extensions: affinity example
   FT Checkpoint support: no (checkpoint thread: no)
     VampirTrace support: yes
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
           MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.6.5)
              MCA memory: linux (MCA v2.0, API v2.0, Component v1.6.5)
           MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.5)
               MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.6.5)
               MCA carto: file (MCA v2.0, API v2.0, Component v1.6.5)
               MCA shmem: mmap (MCA v2.0, API v2.0, Component v1.6.5)
               MCA shmem: posix (MCA v2.0, API v2.0, Component v1.6.5)
               MCA shmem: sysv (MCA v2.0, API v2.0, Component v1.6.5)
           MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.6.5)
           MCA maffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.5)
               MCA timer: linux (MCA v2.0, API v2.0, Component v1.6.5)
         MCA installdirs: env (MCA v2.0, API v2.0, Component v1.6.5)
         MCA installdirs: config (MCA v2.0, API v2.0, Component v1.6.5)
             MCA sysinfo: linux (MCA v2.0, API v2.0, Component v1.6.5)
               MCA hwloc: hwloc132 (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA dpm: orte (MCA v2.0, API v2.0, Component v1.6.5)
              MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.6.5)
           MCA allocator: basic (MCA v2.0, API v2.0, Component v1.6.5)
           MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.6.5)
                MCA coll: basic (MCA v2.0, API v2.0, Component v1.6.5)
                MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.6.5)
                MCA coll: inter (MCA v2.0, API v2.0, Component v1.6.5)
                MCA coll: self (MCA v2.0, API v2.0, Component v1.6.5)
                MCA coll: sm (MCA v2.0, API v2.0, Component v1.6.5)
                MCA coll: sync (MCA v2.0, API v2.0, Component v1.6.5)
                MCA coll: tuned (MCA v2.0, API v2.0, Component v1.6.5)
                  MCA io: romio (MCA v2.0, API v2.0, Component v1.6.5)
               MCA mpool: fake (MCA v2.0, API v2.0, Component v1.6.5)
               MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.6.5)
               MCA mpool: sm (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA pml: bfo (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA pml: csum (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA pml: v (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA bml: r2 (MCA v2.0, API v2.0, Component v1.6.5)
              MCA rcache: vma (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA btl: self (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA btl: sm (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA btl: tcp (MCA v2.0, API v2.0, Component v1.6.5)
                MCA topo: unity (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA osc: rdma (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA iof: hnp (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA iof: orted (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA iof: tool (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA oob: tcp (MCA v2.0, API v2.0, Component v1.6.5)
                MCA odls: default (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ras: cm (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ras: loadleveler (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ras: slurm (MCA v2.0, API v2.0, Component v1.6.5)
               MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.6.5)
               MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.6.5)
               MCA rmaps: resilient (MCA v2.0, API v2.0, Component v1.6.5)
               MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.6.5)
               MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.6.5)
               MCA rmaps: topo (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA rml: oob (MCA v2.0, API v2.0, Component v1.6.5)
              MCA routed: binomial (MCA v2.0, API v2.0, Component v1.6.5)
              MCA routed: cm (MCA v2.0, API v2.0, Component v1.6.5)
              MCA routed: direct (MCA v2.0, API v2.0, Component v1.6.5)
              MCA routed: linear (MCA v2.0, API v2.0, Component v1.6.5)
              MCA routed: radix (MCA v2.0, API v2.0, Component v1.6.5)
              MCA routed: slave (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA plm: rsh (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA plm: slurm (MCA v2.0, API v2.0, Component v1.6.5)
               MCA filem: rsh (MCA v2.0, API v2.0, Component v1.6.5)
              MCA errmgr: default (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ess: env (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ess: hnp (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ess: singleton (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ess: slave (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ess: slurm (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ess: slurmd (MCA v2.0, API v2.0, Component v1.6.5)
                 MCA ess: tool (MCA v2.0, API v2.0, Component v1.6.5)
             MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.6.5)
             MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.6.5)
             MCA grpcomm: hier (MCA v2.0, API v2.0, Component v1.6.5)
            MCA notifier: command (MCA v2.0, API v1.0, Component v1.6.5)
            MCA notifier: syslog (MCA v2.0, API v1.0, Component v1.6.5)


Regards,
Karos



________________________________
From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
[r...@open-mpi.org]
Sent: 28 March 2015 22:04
To: Open MPI Users
Subject: Re: [OMPI users] Connection problem on Linux cluster

Something is clearly wrong. Most likely, you are not pointing at the OMPI 
install that you think you are - or you didn’t really configure it properly. 
Check the path by running “which mpirun” and ensure you are executing the one 
you expected. If so, then run “ompi_info” to see how it was configured and sent 
it to us.


On Mar 28, 2015, at 1:36 PM, LOTFIFAR F. 
<foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote:

surprisingly,  it is all that I get!! nothing else come after.  This is the 
same for openmpi-1.6.5.


________________________________
From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on 
behalf of Ralph Castain [r...@open-mpi.org<mailto:r...@open-mpi.org>]
Sent: 28 March 2015 20:12
To: Open MPI Users
Subject: Re: [OMPI users] Connection problem on Linux cluster

Did you configure —enable-debug? We aren’t seeing any of the debug output, so I 
suspect not.


On Mar 28, 2015, at 12:56 PM, LOTFIFAR F. 
<foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote:

I have done it and it is the results:

ubuntu@fehg-node-0:~$ mpirun -host fehg-node-7 -mca oob_base_verbose 100 -mca 
state_base_verbose 10 hostname
[fehg-node-0:30034] mca: base: components_open: Looking for oob components
[fehg-node-0:30034] mca: base: components_open: opening oob components
[fehg-node-0:30034] mca: base: components_open: found loaded component tcp
[fehg-node-0:30034] mca: base: components_open: component tcp register function 
successful
[fehg-node-0:30034] mca: base: components_open: component tcp open function 
successful
[fehg-node-7:31138] mca: base: components_open: Looking for oob components
[fehg-node-7:31138] mca: base: components_open: opening oob components
[fehg-node-7:31138] mca: base: components_open: found loaded component tcp
[fehg-node-7:31138] mca: base: components_open: component tcp register function 
successful
[fehg-node-7:31138] mca: base: components_open: component tcp open function 
successful

freeze ...

Regards

________________________________
From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on 
behalf of LOTFIFAR F. 
[foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>]
Sent: 28 March 2015 18:49
To: Open MPI Users
Subject: Re: [OMPI users] Connection problem on Linux cluster

fehg_node_1 and fehg-node-7 are the same. it is just a typo.

Correction: VM names are fehg-node-0 and fehg-node-7.


Regards,

________________________________
From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on 
behalf of Ralph Castain [r...@open-mpi.org<mailto:r...@open-mpi.org>]
Sent: 28 March 2015 18:23
To: Open MPI Users
Subject: Re: [OMPI users] Connection problem on Linux cluster

Just to be clear: do you have two physical nodes? Or just one physical node and 
you are running two VMs on it?

On Mar 28, 2015, at 10:51 AM, LOTFIFAR F. 
<foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote:

I have a floating IP for accessing nodes from outside of the cluster and 
internal ip addresses. I tried to run the jobs with both of them (both ip 
addresses) but it makes no difference.
I have just installed openmpi 1.6.5 to see how does this version works. In this 
case I get nothing and I have to press Crtl+c. not output or error is shown.


________________________________
From: users [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] on 
behalf of Ralph Castain [r...@open-mpi.org<mailto:r...@open-mpi.org>]
Sent: 28 March 2015 17:03
To: Open MPI Users
Subject: Re: [OMPI users] Connection problem on Linux cluster

You mentioned running this in a VM - is that IP address correct for getting 
across the VMs?


On Mar 28, 2015, at 8:38 AM, LOTFIFAR F. 
<foad.lotfi...@durham.ac.uk<mailto:foad.lotfi...@durham.ac.uk>> wrote:

Hi ,

I am wondering how can I solve this problem.
System Spec:
1- Linux cluster with two nodes (master and slave) with Ubuntu 12.04 LTS 32bit.
2- openmpi 1.8.4

I do a simple test running on fehg_node_0:
> mpirun -host fehg_node_0,fehg_node_1 hello_world -mca oob_base_verbose 20

and I get the following error:

A process or daemon was unable to complete a TCP connection
to another process:
  Local host:    fehg-node-0
  Remote host:   10.104.5.40
This is usually caused by a firewall on the remote host. Please
check that any firewall (e.g., iptables) has been disabled and
try again.
------------------------------------------------------------
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).

Verbose:
1- I have full access to the VMs on the cluster and setup everything myself
2- Firewall and iptables are all disabled on the nodes
3- nodes can ssh to each other with  no problem
4- non-interactive bash calls works fine i.e. when I run ssh othernode env | 
grep PATH from both nodes, both PATH and LD_LIBRARY_PATH are set correctly
5- I have checked the posts, a similar problem reported for Solaris but I could 
not find a clue about mine.
6- run with --enable-orterun-prefix-by-default does not make any changes.
7-  I see orte is running on the other node when I check processes, but nothing 
happens after that and the error happens.

Regards,
Karos
_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/03/26555.php

_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/03/26557.php

_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/03/26562.php

_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/03/26564.php

Reply via email to