Re: [OMPI users] SIGTERM propagation across MPI processes

Júlio Hoffimann Sun, 25 Mar 2012 14:36:31 -0400

I have no much time now for trying a more recent version, but i'll keep
that in mind. I also dislike the warnings my current version is giving me (
http://www.open-mpi.org/community/lists/devel/2011/08/9606.php). I'll see
how to contact Ubuntu maintainers to update OpenMPI and solve both problems
in one shot. ;-)


Regards,
Júlio.

2012/3/25 Ralph Castain <r...@open-mpi.org>

>
> On Mar 25, 2012, at 11:28 AM, Júlio Hoffimann wrote:
>
> I wrote the version in a previous P.S. statement: MPI 1.4.3 from Ubuntu
> 11.10 repositories. :-)
>
>
> Sorry - I see a lot of emails over the day, and forgot. :-/
>
> Have you tried this on something more recent, like 1.5.4 or even the
> developer's trunk? IIRC, there were some issues in the older 1.4 releases,
> but they have since been fixed.
>
>
> Thanks for the clarifications!
>
> 2012/3/25 Ralph Castain <r...@open-mpi.org>
>
>>
>> On Mar 25, 2012, at 10:57 AM, Júlio Hoffimann wrote:
>>
>> I forgot to mention, i tried to set the odls_base_sigkill_timeout as you
>> told, even 5s was not sufficient for the root execute it's task, and most
>> important, the kill was instantaneous, there is no 5s hang. My erroneous
>> conclusion: SIGKILL was being sent instead of SIGTERM.
>>
>>
>> Which version are you using? Could be a bug in there - I can take a look.
>>
>>
>> About the man page, at least for me, the word "kill" is not clear. The
>> SIGTERM+SIGKILL keywords would be unambiguous.
>>
>>
>> I'll clarify it - thanks!
>>
>>
>> Regards,
>> Júlio.
>>
>> 2012/3/25 Ralph Castain <r...@open-mpi.org>
>>
>>>
>>> On Mar 25, 2012, at 7:19 AM, Júlio Hoffimann wrote:
>>>
>>> Dear Ralph,
>>>
>>> Thank you for your prompt reply. I confirmed what you just said by
>>> reading the mpirun man page at the sections *Signal Propagation* and 
>>> *Process
>>> Termination / Signal Handling*.
>>>
>>> "During the run of an MPI  application,  if  any  rank  dies
>>>  abnormally (either exiting before invoking MPI_FINALIZE, or dying as the
>>> result of a signal), mpirun will print out an error message and kill the
>>> rest  of the MPI application."
>>>
>>> If i understood correctly, the SIGKILL signal is sent to every process
>>> on a premature death.
>>>
>>>
>>> Each process receives a SIGTERM, and then a SIGKILL if it doesn't exit
>>> within a specified time frame. I told you how to adjust that time period in
>>> the prior message.
>>>
>>> In my point of view, i consider this a bug. If OpenMPI allows handling
>>> signals such as SIGTERM, the other processes in the communicator should
>>> also have the opportunity to die prettily. Perhaps i'm missing something?
>>>
>>>
>>> Yes, you are - you do get a SIGTERM first, but you are required to exit
>>> in a timely fashion. You are not allowed to continue running. This is
>>> required in order to ensure proper cleanup of the job, per the MPI standard.
>>>
>>>
>>> Supposing the described behaviour in the last paragraph, i think would
>>> be great to explicitly mention the SIGKILL in the man page, or even better,
>>> fix the implementation to send SIGTERM instead, making possible for the
>>> user cleanup all processes before exit.
>>>
>>>
>>> We already do, as described above.
>>>
>>>
>>> I solved my particular problem by adding another flag *
>>> unexpected_error_on_slave*:
>>>
>>> volatile sig_atomic_t unexpected_error_occurred = 0;int 
>>> unexpected_error_on_slave = 0;enum tag { work_tag, die_tag }
>>> void my_handler( int sig ){
>>>     unexpected_error_occurred = 1;}
>>> //// somewhere in the code...//
>>> signal(SIGTERM, my_handler);
>>> if (root process) {
>>>
>>>     // do stuff
>>>
>>>     world.recv(mpi::any_source, die_tag, unexpected_error_on_slave);
>>>     if ( unexpected_error_occurred || unexpected_error_on_slave ) {
>>>
>>>         // save something
>>>
>>>         world.abort(SIGABRT);
>>>     }}else { // slave process
>>>
>>>     // do different stuff
>>>
>>>     if ( unexpected_error_occurred ) {
>>>
>>>         // just communicate the problem to the root
>>>         world.send(root,die_tag,1);
>>>         signal(SIGTERM,SIG_DFL);
>>>         while(true)
>>>             ; // wait, master will take care of this
>>>     }
>>>     world.send(root,die_tag,0); // everything is fine}
>>> signal(SIGTERM, SIG_DFL);                       // reassign default handler
>>> // continues the code...
>>>
>>>
>>> Note the slave must hang for the store operation get executed at the
>>> root, otherwise we back for the previous scenario. It's theoretically
>>> unnecessary send MPI messages to accomplish the desired cleanup, and in
>>> more complex applications this can turn into a nightmare. As we know,
>>> asynchronous events are insane to debug.
>>>
>>> Best regards,
>>> Júlio.
>>>
>>> P.S.: MPI 1.4.3 from Ubuntu 11.10 repositories.
>>>
>>> 2012/3/23 Ralph Castain <r...@open-mpi.org>
>>>
>>>> Well, yes and no. When a process abnormally terminates, OMPI will kill
>>>> the job - this is done by first hitting each process with a SIGTERM,
>>>> followed shortly thereafter by a SIGKILL. So you do have a short time on
>>>> each process to attempt to cleanup.
>>>>
>>>> My guess is that your signal handler actually is getting called, but we
>>>> then kill the process before you can detect that it was called.
>>>>
>>>> You might try adjusting the time between sigterm and sigkill using
>>>> the odls_base_sigkill_timeout MCA param:
>>>>
>>>> mpirun -mca odls_base_sigkill_timeout N
>>>>
>>>> should cause it to wait for N seconds before issuing the sigkill. Not
>>>> sure if that will help or not - it used to work for me, but I haven't tried
>>>> it for awhile. What versions of OMPI are you using?
>>>>
>>>>
>>>> On Mar 22, 2012, at 4:49 PM, Júlio Hoffimann wrote:
>>>>
>>>> Dear all,
>>>>
>>>> I'm trying to handle signals inside a MPI task farming model. Following
>>>> is a pseudo-code of what i'm trying to achieve:
>>>>
>>>> volatile sig_atomic_t unexpected_error_occurred = 0;
>>>> void my_handler( int sig ){
>>>>     unexpected_error_occurred = 1;}
>>>> //// somewhere in the code...//
>>>> signal(SIGTERM, my_handler);
>>>> if (root process) {
>>>>
>>>>     // do stuff
>>>>
>>>>     if ( unexpected_error_occurred ) {
>>>>
>>>>         // save something
>>>>
>>>>         // reraise the SIGTERM again, but now with the default handler
>>>>         signal(SIGTERM, SIG_DFL);
>>>>         raise(SIGTERM);
>>>>     }}else { // slave process
>>>>
>>>>     // do different stuff
>>>>
>>>>     if ( unexpected_error_occurred ) {
>>>>
>>>>         // just propragate the signal to the root
>>>>         signal(SIGTERM, SIG_DFL);
>>>>         raise(SIGTERM);
>>>>     }}
>>>> signal(SIGTERM, SIG_DFL);                       // reassign default handler
>>>> // continues the code...
>>>>
>>>>
>>>> As can be seen, the signal handling is required for implementing a
>>>> restart feature. All the problem resides in the assumption i made that all
>>>> processes in the communicator will receive a SIGTERM as a side effect. Is
>>>> it a valid assumption? How the actual MPI implementation deals with such
>>>> scenarios?
>>>>
>>>> I also tried to replace all the raise() calls by MPI_Abort(), which
>>>> according to the documentation (
>>>> http://www.open-mpi.org/doc/v1.5/man3/MPI_Abort.3.php), sends a
>>>> SIGTERM to all associated processes. The undesired behaviour persists: when
>>>> killing a slave process, the save section in the root branch is not
>>>> executed.
>>>>
>>>> Appreciate any help,
>>>> Júlio.
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] SIGTERM propagation across MPI processes

Reply via email to