Re: [OMPI users] SIGTERM propagation across MPI processes

Júlio Hoffimann Sun, 25 Mar 2012 12:57:58 -0400

I forgot to mention, i tried to set the odls_base_sigkill_timeout as you
told, even 5s was not sufficient for the root execute it's task, and most
important, the kill was instantaneous, there is no 5s hang. My erroneous
conclusion: SIGKILL was being sent instead of SIGTERM.


About the man page, at least for me, the word "kill" is not clear. The
SIGTERM+SIGKILL keywords would be unambiguous.

Regards,
Júlio.

2012/3/25 Ralph Castain <r...@open-mpi.org>

>
> On Mar 25, 2012, at 7:19 AM, Júlio Hoffimann wrote:
>
> Dear Ralph,
>
> Thank you for your prompt reply. I confirmed what you just said by reading
> the mpirun man page at the sections *Signal Propagation* and *Process
> Termination / Signal Handling*.
>
> "During the run of an MPI  application,  if  any  rank  dies
>  abnormally (either exiting before invoking MPI_FINALIZE, or dying as the
> result of a signal), mpirun will print out an error message and kill the
> rest  of the MPI application."
>
> If i understood correctly, the SIGKILL signal is sent to every process on
> a premature death.
>
>
> Each process receives a SIGTERM, and then a SIGKILL if it doesn't exit
> within a specified time frame. I told you how to adjust that time period in
> the prior message.
>
> In my point of view, i consider this a bug. If OpenMPI allows handling
> signals such as SIGTERM, the other processes in the communicator should
> also have the opportunity to die prettily. Perhaps i'm missing something?
>
>
> Yes, you are - you do get a SIGTERM first, but you are required to exit in
> a timely fashion. You are not allowed to continue running. This is required
> in order to ensure proper cleanup of the job, per the MPI standard.
>
>
> Supposing the described behaviour in the last paragraph, i think would be
> great to explicitly mention the SIGKILL in the man page, or even better,
> fix the implementation to send SIGTERM instead, making possible for the
> user cleanup all processes before exit.
>
>
> We already do, as described above.
>
>
> I solved my particular problem by adding another flag *
> unexpected_error_on_slave*:
>
> volatile sig_atomic_t unexpected_error_occurred = 0;int 
> unexpected_error_on_slave = 0;enum tag { work_tag, die_tag }
> void my_handler( int sig ){
>     unexpected_error_occurred = 1;}
> //// somewhere in the code...//
> signal(SIGTERM, my_handler);
> if (root process) {
>
>     // do stuff
>
>     world.recv(mpi::any_source, die_tag, unexpected_error_on_slave);
>     if ( unexpected_error_occurred || unexpected_error_on_slave ) {
>
>         // save something
>
>         world.abort(SIGABRT);
>     }}else { // slave process
>
>     // do different stuff
>
>     if ( unexpected_error_occurred ) {
>
>         // just communicate the problem to the root
>         world.send(root,die_tag,1);
>         signal(SIGTERM,SIG_DFL);
>         while(true)
>             ; // wait, master will take care of this
>     }
>     world.send(root,die_tag,0); // everything is fine}
> signal(SIGTERM, SIG_DFL);                       // reassign default handler
> // continues the code...
>
>
> Note the slave must hang for the store operation get executed at the root,
> otherwise we back for the previous scenario. It's theoretically unnecessary
> send MPI messages to accomplish the desired cleanup, and in more complex
> applications this can turn into a nightmare. As we know, asynchronous
> events are insane to debug.
>
> Best regards,
> Júlio.
>
> P.S.: MPI 1.4.3 from Ubuntu 11.10 repositories.
>
> 2012/3/23 Ralph Castain <r...@open-mpi.org>
>
>> Well, yes and no. When a process abnormally terminates, OMPI will kill
>> the job - this is done by first hitting each process with a SIGTERM,
>> followed shortly thereafter by a SIGKILL. So you do have a short time on
>> each process to attempt to cleanup.
>>
>> My guess is that your signal handler actually is getting called, but we
>> then kill the process before you can detect that it was called.
>>
>> You might try adjusting the time between sigterm and sigkill using
>> the odls_base_sigkill_timeout MCA param:
>>
>> mpirun -mca odls_base_sigkill_timeout N
>>
>> should cause it to wait for N seconds before issuing the sigkill. Not
>> sure if that will help or not - it used to work for me, but I haven't tried
>> it for awhile. What versions of OMPI are you using?
>>
>>
>> On Mar 22, 2012, at 4:49 PM, Júlio Hoffimann wrote:
>>
>> Dear all,
>>
>> I'm trying to handle signals inside a MPI task farming model. Following
>> is a pseudo-code of what i'm trying to achieve:
>>
>> volatile sig_atomic_t unexpected_error_occurred = 0;
>> void my_handler( int sig ){
>>     unexpected_error_occurred = 1;}
>> //// somewhere in the code...//
>> signal(SIGTERM, my_handler);
>> if (root process) {
>>
>>     // do stuff
>>
>>     if ( unexpected_error_occurred ) {
>>
>>         // save something
>>
>>         // reraise the SIGTERM again, but now with the default handler
>>         signal(SIGTERM, SIG_DFL);
>>         raise(SIGTERM);
>>     }}else { // slave process
>>
>>     // do different stuff
>>
>>     if ( unexpected_error_occurred ) {
>>
>>         // just propragate the signal to the root
>>         signal(SIGTERM, SIG_DFL);
>>         raise(SIGTERM);
>>     }}
>> signal(SIGTERM, SIG_DFL);                       // reassign default handler
>> // continues the code...
>>
>>
>> As can be seen, the signal handling is required for implementing a
>> restart feature. All the problem resides in the assumption i made that all
>> processes in the communicator will receive a SIGTERM as a side effect. Is
>> it a valid assumption? How the actual MPI implementation deals with such
>> scenarios?
>>
>> I also tried to replace all the raise() calls by MPI_Abort(), which
>> according to the documentation (
>> http://www.open-mpi.org/doc/v1.5/man3/MPI_Abort.3.php), sends a SIGTERM
>> to all associated processes. The undesired behaviour persists: when killing
>> a slave process, the save section in the root branch is not executed.
>>
>> Appreciate any help,
>> Júlio.
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] SIGTERM propagation across MPI processes

Reply via email to