Re: [OMPI users] SIGTERM propagation across MPI processes

Ralph Castain Sun, 25 Mar 2012 13:14:32 -0400

On Mar 25, 2012, at 10:57 AM, Júlio Hoffimann wrote:

> I forgot to mention, i tried to set the odls_base_sigkill_timeout as you 
> told, even 5s was not sufficient for the root execute it's task, and most 
> important, the kill was instantaneous, there is no 5s hang. My erroneous 
> conclusion: SIGKILL was being sent instead of SIGTERM.


Which version are you using? Could be a bug in there - I can take a look.

> 
> About the man page, at least for me, the word "kill" is not clear. The 
> SIGTERM+SIGKILL keywords would be unambiguous.

I'll clarify it - thanks!

> 
> Regards,
> Júlio.
> 
> 2012/3/25 Ralph Castain <r...@open-mpi.org>
> 
> On Mar 25, 2012, at 7:19 AM, Júlio Hoffimann wrote:
> 
>> Dear Ralph,
>> 
>> Thank you for your prompt reply. I confirmed what you just said by reading 
>> the mpirun man page at the sections Signal Propagation and Process 
>> Termination / Signal Handling.
>> 
>> "During the run of an MPI  application,  if  any  rank  dies  abnormally 
>> (either exiting before invoking MPI_FINALIZE, or dying as the result of a 
>> signal), mpirun will print out an error message and kill the rest  of the 
>> MPI application."
>> 
>> If i understood correctly, the SIGKILL signal is sent to every process on a 
>> premature death.
> 
> Each process receives a SIGTERM, and then a SIGKILL if it doesn't exit within 
> a specified time frame. I told you how to adjust that time period in the 
> prior message.
> 
>> In my point of view, i consider this a bug. If OpenMPI allows handling 
>> signals such as SIGTERM, the other processes in the communicator should also 
>> have the opportunity to die prettily. Perhaps i'm missing something?
> 
> Yes, you are - you do get a SIGTERM first, but you are required to exit in a 
> timely fashion. You are not allowed to continue running. This is required in 
> order to ensure proper cleanup of the job, per the MPI standard.
> 
>> 
>> Supposing the described behaviour in the last paragraph, i think would be 
>> great to explicitly mention the SIGKILL in the man page, or even better, fix 
>> the implementation to send SIGTERM instead, making possible for the user 
>> cleanup all processes before exit.
> 
> We already do, as described above.
> 
>> 
>> I solved my particular problem by adding another flag 
>> unexpected_error_on_slave:
>> 
>> volatile sig_atomic_t unexpected_error_occurred = 0;
>> int unexpected_error_on_slave = 0;
>> enum tag { work_tag, die_tag }
>> 
>> void my_handler( int sig )
>> {
>>     unexpected_error_occurred = 1;
>> }
>> 
>> //
>> // somewhere in the code...
>> //
>> 
>> signal(SIGTERM, my_handler);
>> 
>> if (root process) {
>> 
>>     // do stuff
>> 
>>     world.recv(mpi::any_source, die_tag, unexpected_error_on_slave);
>>     if ( unexpected_error_occurred || unexpected_error_on_slave ) {
>> 
>>         // save something
>> 
>>         world.abort(SIGABRT);
>>     }
>> }
>> else { // slave process
>> 
>>     // do different stuff
>> 
>>     if ( unexpected_error_occurred ) {
>> 
>>         // just communicate the problem to the root
>>         world.send(root,die_tag,1);
>>         signal(SIGTERM,SIG_DFL);
>>         while(true)
>>             ; // wait, master will take care of this
>>     }
>>     world.send(root,die_tag,0); // everything is fine
>> }
>> 
>> signal(SIGTERM, SIG_DFL);                       // reassign default handler
>> 
>> // continues the code...
>> 
>> Note the slave must hang for the store operation get executed at the root, 
>> otherwise we back for the previous scenario. It's theoretically unnecessary 
>> send MPI messages to accomplish the desired cleanup, and in more complex 
>> applications this can turn into a nightmare. As we know, asynchronous events 
>> are insane to debug.
>> 
>> Best regards,
>> Júlio.
>> 
>> P.S.: MPI 1.4.3 from Ubuntu 11.10 repositories.
>> 
>> 2012/3/23 Ralph Castain <r...@open-mpi.org>
>> Well, yes and no. When a process abnormally terminates, OMPI will kill the 
>> job - this is done by first hitting each process with a SIGTERM, followed 
>> shortly thereafter by a SIGKILL. So you do have a short time on each process 
>> to attempt to cleanup.
>> 
>> My guess is that your signal handler actually is getting called, but we then 
>> kill the process before you can detect that it was called.
>> 
>> You might try adjusting the time between sigterm and sigkill using the 
>> odls_base_sigkill_timeout MCA param:
>> 
>> mpirun -mca odls_base_sigkill_timeout N
>> 
>> should cause it to wait for N seconds before issuing the sigkill. Not sure 
>> if that will help or not - it used to work for me, but I haven't tried it 
>> for awhile. What versions of OMPI are you using?
>> 
>> 
>> On Mar 22, 2012, at 4:49 PM, Júlio Hoffimann wrote:
>> 
>>> Dear all,
>>> 
>>> I'm trying to handle signals inside a MPI task farming model. Following is 
>>> a pseudo-code of what i'm trying to achieve:
>>> 
>>> volatile sig_atomic_t unexpected_error_occurred = 0;
>>> 
>>> void my_handler( int sig )
>>> {
>>>     unexpected_error_occurred = 1;
>>> }
>>> 
>>> //
>>> // somewhere in the code...
>>> //
>>> 
>>> signal(SIGTERM, my_handler);
>>> 
>>> if (root process) {
>>> 
>>>     // do stuff
>>> 
>>>     if ( unexpected_error_occurred ) {
>>> 
>>>         // save something
>>> 
>>>         // reraise the SIGTERM again, but now with the default handler
>>>         signal(SIGTERM, SIG_DFL);
>>>         raise(SIGTERM);
>>>     }
>>> }
>>> else { // slave process
>>> 
>>>     // do different stuff
>>> 
>>>     if ( unexpected_error_occurred ) {
>>> 
>>>         // just propragate the signal to the root
>>>         signal(SIGTERM, SIG_DFL);
>>>         raise(SIGTERM);
>>>     }
>>> }
>>> 
>>> signal(SIGTERM, SIG_DFL);                       // reassign default handler
>>> 
>>> // continues the code...
>>> 
>>> As can be seen, the signal handling is required for implementing a restart 
>>> feature. All the problem resides in the assumption i made that all 
>>> processes in the communicator will receive a SIGTERM as a side effect. Is 
>>> it a valid assumption? How the actual MPI implementation deals with such 
>>> scenarios?
>>> 
>>> I also tried to replace all the raise() calls by MPI_Abort(), which 
>>> according to the documentation 
>>> (http://www.open-mpi.org/doc/v1.5/man3/MPI_Abort.3.php), sends a SIGTERM to 
>>> all associated processes. The undesired behaviour persists: when killing a 
>>> slave process, the save section in the root branch is not executed.
>>> 
>>> Appreciate any help,
>>> Júlio.
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] SIGTERM propagation across MPI processes

Reply via email to