Re: [OMPI users] SIGTERM propagation across MPI processes

Ralph Castain Sun, 25 Mar 2012 12:21:02 -0400

On Mar 25, 2012, at 7:19 AM, Júlio Hoffimann wrote:

> Dear Ralph,
> 
> Thank you for your prompt reply. I confirmed what you just said by reading 
> the mpirun man page at the sections Signal Propagation and Process 
> Termination / Signal Handling.
> 
> "During the run of an MPI  application,  if  any  rank  dies  abnormally 
> (either exiting before invoking MPI_FINALIZE, or dying as the result of a 
> signal), mpirun will print out an error message and kill the rest  of the MPI 
> application."
> 
> If i understood correctly, the SIGKILL signal is sent to every process on a 
> premature death.


Each process receives a SIGTERM, and then a SIGKILL if it doesn't exit within a 
specified time frame. I told you how to adjust that time period in the prior 
message.

> In my point of view, i consider this a bug. If OpenMPI allows handling 
> signals such as SIGTERM, the other processes in the communicator should also 
> have the opportunity to die prettily. Perhaps i'm missing something?

Yes, you are - you do get a SIGTERM first, but you are required to exit in a 
timely fashion. You are not allowed to continue running. This is required in 
order to ensure proper cleanup of the job, per the MPI standard.

> 
> Supposing the described behaviour in the last paragraph, i think would be 
> great to explicitly mention the SIGKILL in the man page, or even better, fix 
> the implementation to send SIGTERM instead, making possible for the user 
> cleanup all processes before exit.

We already do, as described above.

> 
> I solved my particular problem by adding another flag 
> unexpected_error_on_slave:
> 
> volatile sig_atomic_t unexpected_error_occurred = 0;
> int unexpected_error_on_slave = 0;
> enum tag { work_tag, die_tag }
> 
> void my_handler( int sig )
> {
>     unexpected_error_occurred = 1;
> }
> 
> //
> // somewhere in the code...
> //
> 
> signal(SIGTERM, my_handler);
> 
> if (root process) {
> 
>     // do stuff
> 
>     world.recv(mpi::any_source, die_tag, unexpected_error_on_slave);
>     if ( unexpected_error_occurred || unexpected_error_on_slave ) {
> 
>         // save something
> 
>         world.abort(SIGABRT);
>     }
> }
> else { // slave process
> 
>     // do different stuff
> 
>     if ( unexpected_error_occurred ) {
> 
>         // just communicate the problem to the root
>         world.send(root,die_tag,1);
>         signal(SIGTERM,SIG_DFL);
>         while(true)
>             ; // wait, master will take care of this
>     }
>     world.send(root,die_tag,0); // everything is fine
> }
> 
> signal(SIGTERM, SIG_DFL);                       // reassign default handler
> 
> // continues the code...
> 
> Note the slave must hang for the store operation get executed at the root, 
> otherwise we back for the previous scenario. It's theoretically unnecessary 
> send MPI messages to accomplish the desired cleanup, and in more complex 
> applications this can turn into a nightmare. As we know, asynchronous events 
> are insane to debug.
> 
> Best regards,
> Júlio.
> 
> P.S.: MPI 1.4.3 from Ubuntu 11.10 repositories.
> 
> 2012/3/23 Ralph Castain <r...@open-mpi.org>
> Well, yes and no. When a process abnormally terminates, OMPI will kill the 
> job - this is done by first hitting each process with a SIGTERM, followed 
> shortly thereafter by a SIGKILL. So you do have a short time on each process 
> to attempt to cleanup.
> 
> My guess is that your signal handler actually is getting called, but we then 
> kill the process before you can detect that it was called.
> 
> You might try adjusting the time between sigterm and sigkill using the 
> odls_base_sigkill_timeout MCA param:
> 
> mpirun -mca odls_base_sigkill_timeout N
> 
> should cause it to wait for N seconds before issuing the sigkill. Not sure if 
> that will help or not - it used to work for me, but I haven't tried it for 
> awhile. What versions of OMPI are you using?
> 
> 
> On Mar 22, 2012, at 4:49 PM, Júlio Hoffimann wrote:
> 
>> Dear all,
>> 
>> I'm trying to handle signals inside a MPI task farming model. Following is a 
>> pseudo-code of what i'm trying to achieve:
>> 
>> volatile sig_atomic_t unexpected_error_occurred = 0;
>> 
>> void my_handler( int sig )
>> {
>>     unexpected_error_occurred = 1;
>> }
>> 
>> //
>> // somewhere in the code...
>> //
>> 
>> signal(SIGTERM, my_handler);
>> 
>> if (root process) {
>> 
>>     // do stuff
>> 
>>     if ( unexpected_error_occurred ) {
>> 
>>         // save something
>> 
>>         // reraise the SIGTERM again, but now with the default handler
>>         signal(SIGTERM, SIG_DFL);
>>         raise(SIGTERM);
>>     }
>> }
>> else { // slave process
>> 
>>     // do different stuff
>> 
>>     if ( unexpected_error_occurred ) {
>> 
>>         // just propragate the signal to the root
>>         signal(SIGTERM, SIG_DFL);
>>         raise(SIGTERM);
>>     }
>> }
>> 
>> signal(SIGTERM, SIG_DFL);                       // reassign default handler
>> 
>> // continues the code...
>> 
>> As can be seen, the signal handling is required for implementing a restart 
>> feature. All the problem resides in the assumption i made that all processes 
>> in the communicator will receive a SIGTERM as a side effect. Is it a valid 
>> assumption? How the actual MPI implementation deals with such scenarios?
>> 
>> I also tried to replace all the raise() calls by MPI_Abort(), which 
>> according to the documentation 
>> (http://www.open-mpi.org/doc/v1.5/man3/MPI_Abort.3.php), sends a SIGTERM to 
>> all associated processes. The undesired behaviour persists: when killing a 
>> slave process, the save section in the root branch is not executed.
>> 
>> Appreciate any help,
>> Júlio.
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] SIGTERM propagation across MPI processes

Reply via email to