Date: Tue, 29 Jul 2008 09:03:40 -0400 From: "Alexander Shabarshin" <ashabars...@developonbox.com> Subject: Re: [OMPI users] Communitcation between OpenMPI and ClusterTools To: <us...@open-mpi.org> Message-ID: <001e01c8f17b$867d2900$0349130a@Shabarshin> Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Hello Yes, you are right - subnets are different, but routes set up correctly and everything like ping, ssh etc. are working OK between them
But it isn't a routing problem but how the tcp btl in Open MPI decides which interface the nodes can communicate with (completely out of the hands of the TCP stack and lower).
Alexander Shabarshin P.S. Between Linuxes I even tried different versions of OpenMPI 1.2.4 and 1.2.5 - these versions work together correctly, but not with ClusterTools...
Are the linuxes boxes on the same subnet?

--td
----- Original Message ----- From: "Terry Dontje" <terry.don...@sun.com> To: <us...@open-mpi.org> Sent: Tuesday, July 29, 2008 7:20 AM Subject: Re: [OMPI users] Communitcation between OpenMPI and ClusterTools
>I have not tested this type of setup so the following disclaimer needs to >be said. These are not exactly the same release number. They are close but >their code could have something in them that makes them incompatible. > One idea comes to mind is whether the two nodes are on the same subnet? > If they are not on the same subnet I think there is a bug in which the TCP > BTL will recuse itself from communications between the two nodes.
>
> --td
>
>
>
> Date: Mon, 28 Jul 2008 16:58:57 -0400
> From: "Alexander Shabarshin" <ashabars...@developonbox.com>
> Subject: [OMPI users] Communitcation between OpenMPI and ClusterTools
> To: <us...@open-mpi.org>
> Message-ID: <010001c8f0f4$c1ec8990$e7afcea7@Shabarshin>
> Content-Type: text/plain; format=flowed; charset="koi8-r";
> reply-type=original
>
> Hello
>
> I try to launch the same MPI sample code on Linux PC (Intel processors) > servers with OpenMPI 1.2.5 and SunFire X2100 (AMD Opteron) servers with > Solaris 10 and ClusterTools 7.1 (it looks like OpenMPI 1.2.5) using TCP > through Ethernet. Linux PC with Linux PC work fine. SunFire with SunFire > work fine. But when I launch the same task on Linux AND SunFire I get this > error message:
>
> --------------------------------------------------------------------------
> Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> Process 0.1.0 is unable to reach 0.1.1 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or > environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
>  PML add procs failed
>  --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> mpirun noticed that job rank 1 with PID 25782 on node 10.0.0.2 exited on > signal 15 (Terminated).
>
> it was launched by this command:
>
> mpirun --mca btl tcp,self --hostfile mpshosts -np 2 /mpi/sample
>
> /mpi/sample exists on both platforms compiled properly for each particular > platform
>
> Linux machines have replicated path for SUN-like orted launch: > /opt/SUNWhpc/HPC7.1/bin/orted
>
> Servers are pingable from each other. SSH works fine in both directions. > But OpenMPI doesn't work on these servers... How can I make them > understand each other? Thank you!
>
> Alexander Shabarshin
>
> P.S. This is output of ompi_info diagnostic for ClusterTools 7.1:
>
>                Open MPI: 1.2.5r16572-ct7.1b003r3852
>   Open MPI SVN revision: 0
>                Open RTE: 1.2.5r16572-ct7.1b003r3852
>   Open RTE SVN revision: 0
>                    OPAL: 1.2.5r16572-ct7.1b003r3852
>       OPAL SVN revision: 0
>                  Prefix: /opt/SUNWhpc/HPC7.1
> Configured architecture: i386-pc-solaris2.10
>           Configured by: root
>           Configured on: Tue Oct 30 17:37:07 EDT 2007
>          Configure host: burpen-csx10-0
>                Built by:
>                Built on: Tue Oct 30 17:52:10 EDT 2007
>              Built host: burpen-csx10-0
>              C bindings: yes
>            C++ bindings: yes
>      Fortran77 bindings: yes (all)
>      Fortran90 bindings: yes
> Fortran90 bindings size: small
>              C compiler: cc
>     C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
>            C++ compiler: CC
>   C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
>      Fortran77 compiler: f77
>  Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
>      Fortran90 compiler: f95
>  Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
>             C profiling: yes
>           C++ profiling: yes
>     Fortran77 profiling: yes
>     Fortran90 profiling: yes
>          C++ exceptions: yes
>          Thread support: no
>  Internal debug support: no
>     MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>         libltdl support: yes
>   Heterogeneous support: yes
> mpirun default --prefix: yes
>           MCA backtrace: printstack (MCA v1.0, API v1.0, Component v1.2.5)
>           MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.2.5)
>           MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.5)
>               MCA timer: solaris (MCA v1.0, API v1.0, Component v1.2.5)
>         MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.5)
>         MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.5)
>           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>                MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.5)
>                MCA coll: self (MCA v1.0, API v1.0, Component v1.2.5)
>                MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.5)
>                MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.5)
>                  MCA io: romio (MCA v1.0, API v1.0, Component v1.2.5)
>               MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.5)
>               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.5)
>              MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.5)
>                 MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.5)
>                 MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
>                 MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.5)
>                MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.5)
>              MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.5)
>              MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.5)
>              MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.5)
>                  MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.5)
>                  MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.5)
>                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>                 MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.5)
> MCA rmaps: round_robin (MCA v1.0, API v1.3, Component > v1.2.5)
>                MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.5)
>                MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.5)
>                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA sds: env (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.5)
>
> and output of ompi_info diagnostic for OpenMPI 1.2.5 compiled on Linux:
>
>                Open MPI: 1.2.5
>   Open MPI SVN revision: r16989
>                Open RTE: 1.2.5
>   Open RTE SVN revision: r16989
>                    OPAL: 1.2.5
>       OPAL SVN revision: r16989
>                  Prefix: /usr/local
> Configured architecture: i686-pc-linux-gnu
>           Configured by: shaos
>           Configured on: Thu Jul 24 12:07:38 EDT 2008
>          Configure host: remote-linux
>                Built by: shaos
>                Built on: Thu Jul 24 12:23:40 EDT 2008
>              Built host: remote-linux
>              C bindings: yes
>            C++ bindings: yes
>      Fortran77 bindings: yes (all)
>      Fortran90 bindings: no
> Fortran90 bindings size: na
>              C compiler: gcc
>     C compiler absolute: /usr/bin/gcc
>            C++ compiler: g++
>   C++ compiler absolute: /usr/bin/g++
>      Fortran77 compiler: g77
>  Fortran77 compiler abs: /usr/bin/g77
>      Fortran90 compiler: none
>  Fortran90 compiler abs: none
>             C profiling: yes
>           C++ profiling: yes
>     Fortran77 profiling: yes
>     Fortran90 profiling: no
>          C++ exceptions: no
>          Thread support: posix (mpi: no, progress: no)
>  Internal debug support: no
>     MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>         libltdl support: yes
>   Heterogeneous support: yes
> mpirun default --prefix: no
>           MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.5)
>              MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.5)
>           MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.5)
>           MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.5)
>               MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.5)
>         MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.5)
>         MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.5)
>           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>                MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.5)
>                MCA coll: self (MCA v1.0, API v1.0, Component v1.2.5)
>                MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.5)
>                MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.5)
>                  MCA io: romio (MCA v1.0, API v1.0, Component v1.2.5)
>               MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.5)
>               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.5)
>              MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.5)
>                 MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.5)
>                 MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
>                MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.5)
>              MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.5)
>              MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.5)
>              MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.5)
>                  MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.5)
>                  MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.5)
>                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>                 MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.5)
> MCA rmaps: round_robin (MCA v1.0, API v1.3, Component > v1.2.5)
>                MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.5)
>                MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.5)
>                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA pls: slurm (MCA v1.0, API v1.3, Component v1.2.5)
>                 MCA sds: env (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.5)
>                 MCA sds: slurm (MCA v1.0, API v1.0, Component v1.2.5)
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to