Hmmm...well, nothing definitive there, I'm afraid.

All I can suggest is to remove/reduce the threading. Like I said, we aren't 
terribly thread safe at this time. I suspect you're stepping into one of those 
non-safe areas here.

Hopefully will do better in later releases.

On Sep 6, 2011, at 1:20 PM, Simone Pellegrini wrote:

> On 09/06/2011 04:58 PM, Ralph Castain wrote:
>> On Sep 6, 2011, at 12:49 PM, Simone Pellegrini wrote:
>> 
>>> On 09/06/2011 02:57 PM, Ralph Castain wrote:
>>>> Hi Simone
>>>> 
>>>> Just to clarify: is your application threaded? Could you please send the 
>>>> OMPI configure cmd you used?
>>> yes, it is threaded. There are basically 3 threads, 1 for the outgoing 
>>> messages (MPI_send), 1 for incoming messages (MPI_Iprobe / MPI_Recv) and 
>>> one spawning.
>>> 
>>> I am not sure what you mean with OMPI configure cmd I used... I simply do 
>>> mpirun --np 1 ./executable
>> How was OMPI configured when it was installed? If you didn't install it, 
>> then provide the output of ompi_info - it will tell us.
> [@arch-moto tasksys]$ ompi_info
>                 Package: Open MPI nobody@alderaan Distribution
>                Open MPI: 1.5.3
>   Open MPI SVN revision: r24532
>   Open MPI release date: Mar 16, 2011
>                Open RTE: 1.5.3
>   Open RTE SVN revision: r24532
>   Open RTE release date: Mar 16, 2011
>                    OPAL: 1.5.3
>       OPAL SVN revision: r24532
>       OPAL release date: Mar 16, 2011
>            Ident string: 1.5.3
>                  Prefix: /usr
> Configured architecture: x86_64-unknown-linux-gnu
>          Configure host: alderaan
>           Configured by: nobody
>           Configured on: Thu Jul  7 13:21:35 UTC 2011
>          Configure host: alderaan
>                Built by: nobody
>                Built on: Thu Jul  7 13:27:08 UTC 2011
>              Built host: alderaan
>              C bindings: yes
>            C++ bindings: yes
>      Fortran77 bindings: yes (all)
>      Fortran90 bindings: yes
> Fortran90 bindings size: small
>              C compiler: gcc
>     C compiler absolute: /usr/bin/gcc
>  C compiler family name: GNU
>      C compiler version: 4.6.1
>            C++ compiler: g++
>   C++ compiler absolute: /usr/bin/g++
>      Fortran77 compiler: gfortran
>  Fortran77 compiler abs: /usr/bin/gfortran
>      Fortran90 compiler: /usr/bin/gfortran
>  Fortran90 compiler abs:
>             C profiling: yes
>           C++ profiling: yes
>     Fortran77 profiling: yes
>     Fortran90 profiling: yes
>          C++ exceptions: no
>          Thread support: posix (mpi: yes, progress: no)
>           Sparse Groups: no
>  Internal debug support: yes
>  MPI interface warnings: no
>     MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>         libltdl support: yes
>   Heterogeneous support: no
> mpirun default --prefix: no
>         MPI I/O support: yes
>       MPI_WTIME support: gettimeofday
>     Symbol vis. support: yes
>          MPI extensions: affinity example
>   FT Checkpoint support: no (checkpoint thread: no)
>  MPI_MAX_PROCESSOR_NAME: 256
>    MPI_MAX_ERROR_STRING: 256
>     MPI_MAX_OBJECT_NAME: 64
>        MPI_MAX_INFO_KEY: 36
>        MPI_MAX_INFO_VAL: 256
>       MPI_MAX_PORT_NAME: 1024
>  MPI_MAX_DATAREP_STRING: 128
>           MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.5.3)
>          MCA memchecker: valgrind (MCA v2.0, API v2.0, Component v1.5.3)
>              MCA memory: linux (MCA v2.0, API v2.0, Component v1.5.3)
>           MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.5.3)
>               MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.5.3)
>               MCA carto: file (MCA v2.0, API v2.0, Component v1.5.3)
>           MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.5.3)
>               MCA timer: linux (MCA v2.0, API v2.0, Component v1.5.3)
>         MCA installdirs: env (MCA v2.0, API v2.0, Component v1.5.3)
>         MCA installdirs: config (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA dpm: orte (MCA v2.0, API v2.0, Component v1.5.3)
>              MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.5.3)
>           MCA allocator: basic (MCA v2.0, API v2.0, Component v1.5.3)
>           MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.5.3)
>                MCA coll: basic (MCA v2.0, API v2.0, Component v1.5.3)
>                MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.5.3)
>                MCA coll: inter (MCA v2.0, API v2.0, Component v1.5.3)
>                MCA coll: self (MCA v2.0, API v2.0, Component v1.5.3)
>                MCA coll: sm (MCA v2.0, API v2.0, Component v1.5.3)
>                MCA coll: sync (MCA v2.0, API v2.0, Component v1.5.3)
>                MCA coll: tuned (MCA v2.0, API v2.0, Component v1.5.3)
>                  MCA io: romio (MCA v2.0, API v2.0, Component v1.5.3)
>               MCA mpool: fake (MCA v2.0, API v2.0, Component v1.5.3)
>               MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.5.3)
>               MCA mpool: sm (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA pml: bfo (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA pml: csum (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA pml: v (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA bml: r2 (MCA v2.0, API v2.0, Component v1.5.3)
>              MCA rcache: vma (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA btl: self (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA btl: sm (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA btl: tcp (MCA v2.0, API v2.0, Component v1.5.3)
>                MCA topo: unity (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA osc: rdma (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA iof: hnp (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA iof: orted (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA iof: tool (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA oob: tcp (MCA v2.0, API v2.0, Component v1.5.3)
>                MCA odls: default (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA ras: cm (MCA v2.0, API v2.0, Component v1.5.3)
>               MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.5.3)
>               MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.5.3)
>               MCA rmaps: resilient (MCA v2.0, API v2.0, Component v1.5.3)
>               MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.5.3)
>               MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.5.3)
>               MCA rmaps: topo (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA rml: oob (MCA v2.0, API v2.0, Component v1.5.3)
>              MCA routed: binomial (MCA v2.0, API v2.0, Component v1.5.3)
>              MCA routed: cm (MCA v2.0, API v2.0, Component v1.5.3)
>              MCA routed: direct (MCA v2.0, API v2.0, Component v1.5.3)
>              MCA routed: linear (MCA v2.0, API v2.0, Component v1.5.3)
>              MCA routed: radix (MCA v2.0, API v2.0, Component v1.5.3)
>              MCA routed: slave (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA plm: rsh (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA plm: rshd (MCA v2.0, API v2.0, Component v1.5.3)
>               MCA filem: rsh (MCA v2.0, API v2.0, Component v1.5.3)
>              MCA errmgr: default (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA ess: env (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA ess: hnp (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA ess: singleton (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA ess: slave (MCA v2.0, API v2.0, Component v1.5.3)
>                 MCA ess: tool (MCA v2.0, API v2.0, Component v1.5.3)
>             MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.5.3)
>             MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.5.3)
>             MCA grpcomm: hier (MCA v2.0, API v2.0, Component v1.5.3)
>            MCA notifier: command (MCA v2.0, API v1.0, Component v1.5.3)
>            MCA notifier: syslog (MCA v2.0, API v1.0, Component v1.5.3)
> 
> 
>> 
>>>> Adding the debug flags just changes the race condition. Interestingly, 
>>>> those values only impact the behavior of mpirun, so it looks like the race 
>>>> condition is occurring there.
>>> The problem is that the error is totally nondeterministic. Sometimes 
>>> happens, others not but the error message gives me no clue where the error 
>>> is coming from. Is is a problem of my code or internal MPI?
>> Can't tell, but it is likely an impact of threading. Race conditions within 
>> threaded environments are common, and OMPI isn't particularly thread safe, 
>> especially when it comes to comm_spawn.
>> 
>>> cheers, Simone
>>>> 
>>>> On Sep 6, 2011, at 3:01 AM, Simone Pellegrini wrote:
>>>> 
>>>>> Dear all,
>>>>> I am developing an MPI application which uses heavily MPI_Spawn. Usually 
>>>>> everything works fine for the first hundred spawn but after a while the 
>>>>> application exist with a curious message:
>>>>> 
>>>>> [arch-top:27712] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read 
>>>>> past end of buffer in file base/grpcomm_base_modex.c at line 349
>>>>> [arch-top:27712] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read 
>>>>> past end of buffer in file grpcomm_bad_module.c at line 518
>>>>> --------------------------------------------------------------------------
>>>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>>>> likely to abort.  There are many reasons that a parallel process can
>>>>> fail during MPI_INIT; some of which are due to configuration or 
>>>>> environment
>>>>> problems.  This failure appears to be an internal failure; here's some
>>>>> additional information (which may only be relevant to an Open MPI
>>>>> developer):
>>>>> 
>>>>>  ompi_proc_set_arch failed
>>>>>  -->   Returned "Data unpack would read past end of buffer" (-26) instead 
>>>>> of "Success" (0)
>>>>> --------------------------------------------------------------------------
>>>>> *** The MPI_Init_thread() function was called before MPI_INIT was invoked.
>>>>> *** This is disallowed by the MPI standard.
>>>>> *** Your MPI job will now abort.
>>>>> [arch-top:27712] Abort before MPI_INIT completed successfully; not able 
>>>>> to guarantee that all other processes were killed!
>>>>> [arch-top:27714] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read 
>>>>> past end of buffer in file base/grpcomm_base_modex.c at line 349
>>>>> [arch-top:27714] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read 
>>>>> past end of buffer in file grpcomm_bad_module.c at line 518
>>>>> *** The MPI_Init_thread() function was called before MPI_INIT was invoked.
>>>>> *** This is disallowed by the MPI standard.
>>>>> *** Your MPI job will now abort.
>>>>> [arch-top:27714] Abort before MPI_INIT completed successfully; not able 
>>>>> to guarantee that all other processes were killed!
>>>>> [arch-top:27226] 1 more process has sent help message help-mpi-runtime / 
>>>>> mpi_init:startup:internal-failure
>>>>> [arch-top:27226] Set MCA parameter "orte_base_help_aggregate" to 0 to see 
>>>>> all help / error messages
>>>>> 
>>>>> Also using MPI_init instead of MPI_Init_thread does not help, the same 
>>>>> error occurs.
>>>>> 
>>>>> Strangely the error does not occur if I run the code enabling debug in 
>>>>> (-mca plm_base_verbose 5 -mca rmaps_base_verbose 5).
>>>>> 
>>>>> I am using OpenMPI 1.5.3
>>>>> 
>>>>> cheers, Simone
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to