On Sep 6, 2011, at 12:49 PM, Simone Pellegrini wrote: > On 09/06/2011 02:57 PM, Ralph Castain wrote: >> Hi Simone >> >> Just to clarify: is your application threaded? Could you please send the >> OMPI configure cmd you used? > > yes, it is threaded. There are basically 3 threads, 1 for the outgoing > messages (MPI_send), 1 for incoming messages (MPI_Iprobe / MPI_Recv) and one > spawning. > > I am not sure what you mean with OMPI configure cmd I used... I simply do > mpirun --np 1 ./executable
How was OMPI configured when it was installed? If you didn't install it, then provide the output of ompi_info - it will tell us. > >> >> Adding the debug flags just changes the race condition. Interestingly, those >> values only impact the behavior of mpirun, so it looks like the race >> condition is occurring there. > The problem is that the error is totally nondeterministic. Sometimes happens, > others not but the error message gives me no clue where the error is coming > from. Is is a problem of my code or internal MPI? Can't tell, but it is likely an impact of threading. Race conditions within threaded environments are common, and OMPI isn't particularly thread safe, especially when it comes to comm_spawn. > > cheers, Simone >> >> >> On Sep 6, 2011, at 3:01 AM, Simone Pellegrini wrote: >> >>> Dear all, >>> I am developing an MPI application which uses heavily MPI_Spawn. Usually >>> everything works fine for the first hundred spawn but after a while the >>> application exist with a curious message: >>> >>> [arch-top:27712] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read >>> past end of buffer in file base/grpcomm_base_modex.c at line 349 >>> [arch-top:27712] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read >>> past end of buffer in file grpcomm_bad_module.c at line 518 >>> -------------------------------------------------------------------------- >>> It looks like MPI_INIT failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during MPI_INIT; some of which are due to configuration or environment >>> problems. This failure appears to be an internal failure; here's some >>> additional information (which may only be relevant to an Open MPI >>> developer): >>> >>> ompi_proc_set_arch failed >>> --> Returned "Data unpack would read past end of buffer" (-26) instead of >>> "Success" (0) >>> -------------------------------------------------------------------------- >>> *** The MPI_Init_thread() function was called before MPI_INIT was invoked. >>> *** This is disallowed by the MPI standard. >>> *** Your MPI job will now abort. >>> [arch-top:27712] Abort before MPI_INIT completed successfully; not able to >>> guarantee that all other processes were killed! >>> [arch-top:27714] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read >>> past end of buffer in file base/grpcomm_base_modex.c at line 349 >>> [arch-top:27714] [[36904,165],0] ORTE_ERROR_LOG: Data unpack would read >>> past end of buffer in file grpcomm_bad_module.c at line 518 >>> *** The MPI_Init_thread() function was called before MPI_INIT was invoked. >>> *** This is disallowed by the MPI standard. >>> *** Your MPI job will now abort. >>> [arch-top:27714] Abort before MPI_INIT completed successfully; not able to >>> guarantee that all other processes were killed! >>> [arch-top:27226] 1 more process has sent help message help-mpi-runtime / >>> mpi_init:startup:internal-failure >>> [arch-top:27226] Set MCA parameter "orte_base_help_aggregate" to 0 to see >>> all help / error messages >>> >>> Also using MPI_init instead of MPI_Init_thread does not help, the same >>> error occurs. >>> >>> Strangely the error does not occur if I run the code enabling debug in >>> (-mca plm_base_verbose 5 -mca rmaps_base_verbose 5). >>> >>> I am using OpenMPI 1.5.3 >>> >>> cheers, Simone >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users