Andrew,

Failure looks like:

> + mpirun --prefix
> +
/tools/openmpi/1.3a1r16632_svn/infinicon/gcc64/4.1.2/udapl/suse_sles_1
> + 0/x86_64/opteron -np 8
>  -machinefile H ./a.out
> Process 0 of 8 on s1470
> Process 1 of 8 on s1470
> Process 4 of 8 on s1469
> Process 2 of 8 on s1470
> Process 7 of 8 on s1469
> Process 5 of 8 on s1469
> Process 6 of 8 on s1469
> Process 3 of 8 on s1470
> 30989:a.out *->0 (f=noaffinity,0,1,2,3)
> 30988:a.out *->0 (f=noaffinity,0,1,2,3)
> 30990:a.out *->0 (f=noaffinity,0,1,2,3)
> 30372:a.out *->0 (f=noaffinity,0,1,2,3)
> 30991:a.out *->0 (f=noaffinity,0,1,2,3)
> 30370:a.out *->0 (f=noaffinity,0,1,2,3)
> 30369:a.out *->0 (f=noaffinity,0,1,2,3)
> 30371:a.out *->0 (f=noaffinity,0,1,2,3)
>  get ASYNC ERROR = 6
> [s1469:30369] *** Process received signal *** [s1469:30369] Signal:
> Segmentation fault (11) [s1469:30369] Signal code: Address not mapped
> (1) [s1469:30369] Failing at address: 0x110 [s1469:30369] [ 0]
> /lib64/libpthread.so.0 [0x2b528ceefc10] [s1469:30369] [ 1]
> /lib64/libdapl.so(dapl_llist_next_entry+0x25) [0x2b528fba5df5]
> [s1469:30369] *** End of error message ***

> and in a /var/log/messages I see:
> > Nov 5 14:46:00 s1469 sshd[30363]: Accepted publickey for mostyn from
> 10.173.132.37 port 36211 ssh2 Nov  5 14:46:25 s1469 kernel: TVpd:
> !ERROR! Async Event:TAVOR_EQE_TYPE_CQ_ERR: (CQ Access Error) cqn:641
Nov
> 5 14:46:25 s1469 kernel: a.out[30374]: segfault at 0000000000000110
rip
> 00002b528fba5df5 rsp 00000000410010b0 error 4
> > This is repoducible. > > Is this OpenMPI or your libdapl that's doing this, you think? > > + ompi_info
>                 Open MPI: 1.3a1svn11022007
>    Open MPI SVN revision: svn11022007
>                 Open RTE: 1.3a1svn11022007
>    Open RTE SVN revision: svn11022007
>                     OPAL: 1.3a1svn11022007
>        OPAL SVN revision: svn11022007
>                   Prefix:
> /tools/openmpi/1.3a1r16632_svn/infinicon/gcc64/4.1.2/udapl/suse_sles_10/
> x86_64/opter
> on
>  Configured architecture: x86_64-unknown-linux-gnu
>           Configure host: s1471
>            Configured by: root
>            Configured on: Fri Nov  2 16:20:29 PDT 2007
>           Configure host: s1471
>                 Built by: mostyn
>                 Built on: Fri Nov  2 16:30:07 PDT 2007
>               Built host: s1471
>               C bindings: yes
>             C++ bindings: yes
>       Fortran77 bindings: yes (all)
>       Fortran90 bindings: yes
>  Fortran90 bindings size: small
>               C compiler: gcc
>      C compiler absolute: /usr/bin/gcc
>             C++ compiler: g++
>    C++ compiler absolute: /usr/bin/g++
>       Fortran77 compiler: gfortran
>   Fortran77 compiler abs: /usr/bin/gfortran
>       Fortran90 compiler: gfortran
>   Fortran90 compiler abs: /usr/bin/gfortran
>              C profiling: yes
>            C++ profiling: yes
>      Fortran77 profiling: yes
>      Fortran90 profiling: yes
>           C++ exceptions: no
>           Thread support: posix (mpi: no, progress: no)
>            Sparse Groups: no
>   Internal debug support: no
>      MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>          libltdl support: yes
>    Heterogeneous support: yes
>  mpirun default --prefix: no
>          MPI I/O support: yes
>            MCA backtrace: execinfo (MCA v1.0, API v1.0, Component
v1.3)
>               MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
> v1.3)
>            MCA paffinity: linux (MCA v1.0, API v1.1, Component v1.3)
>            MCA maffinity: first_use (MCA v1.0, API v1.0, Component
> v1.3)
>            MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.3)
>                MCA timer: linux (MCA v1.0, API v1.0, Component v1.3)
>          MCA installdirs: env (MCA v1.0, API v1.0, Component v1.3)
>          MCA installdirs: config (MCA v1.0, API v1.0, Component v1.3)
>            MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>            MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>                 MCA coll: basic (MCA v1.0, API v1.1, Component v1.3)
>                 MCA coll: inter (MCA v1.0, API v1.1, Component v1.3)
>                 MCA coll: self (MCA v1.0, API v1.1, Component v1.3)
>                 MCA coll: sm (MCA v1.0, API v1.1, Component v1.3)
>                 MCA coll: tuned (MCA v1.0, API v1.1, Component v1.3)
>                   MCA io: romio (MCA v1.0, API v1.0, Component v1.3)
>                MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.3)
>                MCA mpool: sm (MCA v1.0, API v1.0, Component v1.3)
>                  MCA pml: cm (MCA v1.0, API v1.0, Component v1.3)
>                  MCA pml: dr (MCA v1.0, API v1.0, Component v1.3)
>                  MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.3)
>                  MCA bml: r2 (MCA v1.0, API v1.0, Component v1.3)
>               MCA rcache: vma (MCA v1.0, API v1.0, Component v1.3)
>                  MCA btl: self (MCA v1.0, API v1.0.1, Component v1.3)
>                  MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.3)
>                  MCA btl: udapl (MCA v1.0, API v1.0, Component v1.3)
>                 MCA topo: unity (MCA v1.0, API v1.0, Component v1.3)
>                  MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.3)
>                  MCA osc: rdma (MCA v1.0, API v1.0, Component v1.3)
>               MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.3)
>               MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.3)
>               MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.3)
>                  MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.3)
>                  MCA gpr: replica (MCA v1.0, API v1.0, Component v1.3)
>              MCA grpcomm: basic (MCA v1.0, API v2.0, Component v1.3)
>                  MCA iof: proxy (MCA v1.0, API v1.0, Component v1.3)
>                  MCA iof: svc (MCA v1.0, API v1.0, Component v1.3)
>                   MCA ns: proxy (MCA v1.0, API v2.0, Component v1.3)
>                   MCA ns: replica (MCA v1.0, API v2.0, Component v1.3)
>                  MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>                 MCA odls: default (MCA v1.0, API v1.3, Component v1.3)
>                  MCA ras: dash_host (MCA v1.0, API v1.3, Component
> v1.3)
>                  MCA ras: localhost (MCA v1.0, API v1.3, Component
> v1.3)
>                  MCA ras: slurm (MCA v1.0, API v1.3, Component v1.3)
>                  MCA rds: hostfile (MCA v1.0, API v1.3, Component
v1.3)
>                  MCA rds: proxy (MCA v1.0, API v1.3, Component v1.3)
>                MCA rmaps: round_robin (MCA v1.0, API v1.3, Component
> v1.3)
>                 MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.3)
>                 MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.3)
>                  MCA rml: oob (MCA v1.0, API v1.0, Component v1.3)
>               MCA routed: tree (MCA v1.0, API v1.0, Component v1.3)
>               MCA routed: unity (MCA v1.0, API v1.0, Component v1.3)
>                  MCA pls: proxy (MCA v1.0, API v1.3, Component v1.3)
>                  MCA pls: rsh (MCA v1.0, API v1.3, Component v1.3)
>                  MCA pls: slurm (MCA v1.0, API v1.3, Component v1.3)
>                  MCA sds: env (MCA v1.0, API v1.0, Component v1.3)
>                  MCA sds: pipe (MCA v1.0, API v1.0, Component v1.3)
>                  MCA sds: seed (MCA v1.0, API v1.0, Component v1.3)
>                  MCA sds: singleton (MCA v1.0, API v1.0, Component
> v1.3)
>                  MCA sds: slurm (MCA v1.0, API v1.0, Component v1.3)
>                MCA filem: rsh (MCA v1.0, API v1.0, Component v1.3)


Regards,
Mostyn


On Tue, 6 Nov 2007, Andrew Friedley wrote:

All thread support is disabled by default in Open MPI; the uDAPL BTL is
neither thread safe nor makes use of a threaded uDAPL implementation.
For completeness, the thread support is controlled by the
--enable-mpi-threads and --enable-progress-threads options to the
configure script.

The referense you're seeing to libpthread.so.0 is a side effect of the
way we print backtraces when crashes occur and can be ignored.

How exactly does your MPI program fail?  Make sure you take a look at
http://www.open-mpi.org/community/help/ and provide all relevant
information.

Andrew

Mostyn Lewis wrote:
I'm trying to build a udapl OpenMPI from last Friday's SVN and using
Qlogic/QuickSilver/SilverStorm 4.1.0.0.1 software. I can get it
made and it works in machine. With IB between 2 machines is fails
near termination of a job. Qlogic says they don't have a threaded
udapl (libpthread is in the traceback).

How do you (can you?) configure pthreads away alltogether?

Mostyn
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to