Pak,

Thanks. After I received your email I went back and checked my patch install logs (I'd not missed that I needed the patch). Turns out that patch install had failed on the node and when I applied the patch by hand and rebooted it all started working.

Thanks again for taking the time to reply at the weekend! Much appreciated.

   Glenn


Glenn,

Are you running with Solaris 10 Update 3 (11/06) and with this patch
125793-01? It is required for running with udapl btl.

http://www.sun.com/products-n-solutions/hardware/docs/html/819-7478-11/body.html#93180

Glenn Carver wrote:
 Further to my email below regarding problems with uDAPL across IB, I
 found a bug report lodged with Sun (also reported with Opensolaris at:
 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6545187).
 I will lodge a support call with Sun first thing Monday though it
 might not get me very far.

 Would ditching clustertools and compiling the latest open-mpi and
 trying the IB / OpenIB interface work for me?  Another option would
 be to revert to clustertools 6 but ideally I need the better
 implementation of MPI2 that's in open-mpi.

 Any workarounds on the first issue appreciated and advice on the
 second question appreciated too!

    Thanks
                 Glenn


 Hi,

 I'm trying to set up a new small cluster. It's based on Sun's X4100
 servers running Solaris 10_x86. I have Open MPI that comes within
 Clustertools 7.  In addition, I have an Infiniband network between
 the nodes.  I can run parallel jobs fine if processes remain on one
 node (each node has 4 cores). However, as soon as I try to run across
 the nodes I get these errors from the job:


[node3][0,1,8][/ws/hpc-ct-7/builds/7.0.build-ct7-030/ompi-ct7/ompi/mca/btl/udapl/btl_udapl_component.c:827:mca_btl_udapl_component_progress]
 WARNING : Connection event not handled : 16391

 I've had a good look through the archives but can't find a reference
 to this error. I realise the udapl interface is a sun addition to
 OpenMPI but I'm hoping someone else will have seen this before and
 know what's wrong. I have checked my IB network is functioning
 correctly (seemed the obvious thing that could be wrong).

 Any pointers on what could be wrong much appreciated.

         Glenn

 ifconfig for the IB port reports:

 $ ifconfig ibd1
 ibd1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 2044 index 3
          inet 192.168.50.200 netmask ffffff00 broadcast 192.168.50.255

 .. and for the other configured interface:

 $ ifconfig e1000g0
e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
          inet 192.168.47.190 netmask ffffff00 broadcast 192.168.47.255

 Output from ompi_info is:

 ompi_info | more
                  Open MPI: 1.2.1r14096-ct7b030r1838
     Open MPI SVN revision: 0
                  Open RTE: 1.2.1r14096-ct7b030r1838
     Open RTE SVN revision: 0
                      OPAL: 1.2.1r14096-ct7b030r1838
         OPAL SVN revision: 0
                    Prefix: /opt/SUNWhpc/HPC7.0
   Configured architecture: i386-pc-solaris2.10
             Configured by: root
             Configured on: Fri Mar 30 13:40:12 EDT 2007
 >>            Configure host: burpen-csx10-0
                  Built by: root
                  Built on: Fri Mar 30 13:57:25 EDT 2007
                Built host: burpen-csx10-0
                C bindings: yes
              C++ bindings: yes
        Fortran77 bindings: yes (all)
        Fortran90 bindings: yes
   Fortran90 bindings size: trivial
                C compiler: cc
       C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
              C++ compiler: CC
     C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
        Fortran77 compiler: f77
    Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
 >>        Fortran90 compiler: f95
    Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
               C profiling: yes
             C++ profiling: yes
       Fortran77 profiling: yes
       Fortran90 profiling: yes
            C++ exceptions: yes
            Thread support: no
    Internal debug support: no
       MPI parameter check: runtime
 Memory profiling support: no
 Memory debugging support: no
           libltdl support: yes
     Heterogeneous support: yes
   mpirun default --prefix: yes
MCA backtrace: printstack (MCA v1.0, API v1.0, Component v1.2.1)
             MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.2.1)
             MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA timer: solaris (MCA v1.0, API v1.0, Component v1.2.1)
             MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
             MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                  MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1)
                  MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1)
                  MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
                  MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1)
                    MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA mpool: udapl (MCA v1.0, API v1.0, Component v1.2.1)
                   MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1)
                   MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1)
                   MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1)
                MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2.1)
                MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1)
                   MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.1)
                   MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1)
                   MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
                   MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.1)
                  MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.1)
                   MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.1)
                MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1)
                MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.1)
                MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.1)
                   MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.1)
                   MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.1)
                   MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.1)
                   MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.1)
                   MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.1)
                    MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.1)
                    MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.1)
                   MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
                   MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.1)
MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.1)
                   MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.1)
                   MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
 >>                   MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.1)
                   MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.1)
                   MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.1)
MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2.1)
                  MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.1)
                  MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.1)
                   MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.1)
MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.1)
                   MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.1)
 >>                   MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.1)
                   MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1)
                   MCA sds: env (MCA v1.0, API v1.0, Component v1.2.1)
                   MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.1)
                   MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.1)
                   MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.1)
 _______________________________________________
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users

 _______________________________________________
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users


--


- Pak Lui
pak....@sun.com
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to