On Tue, Jun 7, 2016 at 3:06 AM, Gert Doering <g...@greenie.muc.de> wrote:

> Hi,
>
> On Tue, Jun 07, 2016 at 12:44:20AM -0400, Selva Nair wrote:
> > This allows exit notification to complete and finally trigger SIGTERM.
> > The current practice of allowing a restart in this state clears
> > the exit notification timer data and thus loses the SIGTERM.
>
> This is along the lines I was thinking as well, so I like it better than
> the first try (which was sort of fiddling with the symptoms, but not with
> the underlying signal handling problem).
>
> What if SIGTERM+SIGUSR1 are received in very quick succession, so that
> we haven't even entered "explicit_exit_notification_interval" yet?
>

This patch only addresses the issue of restart if exit notify is ongoing.
The second problem of signals getting overwritten before getting processed
has to be addressed independently. I am not yet sure how best to do this as
there are many places and many ways this can happen.


>
> I'm not sure if I understand the full chain of events that happen on
> a (soft or hard) SIGTERM, what state ->signal_received has when the
> exit notification timer is active, and under which circumstances
> get_signal() is called to collect signal info into the context... this
> is magic stuff...
>

In most cases, the signal gets noticed through the EVENT_LOOP_CHECK_SIG
(same as P2P_CHECK_SIG) which breaks the loop or through 'if (IS_SIG)
return'.
When exit notification is ongoing, signal_received is zero: SIGTERM
received --> processed --> removed and exit-notify started. This is needed
to make sure we don't exit until the notification is completed. The SIGTERM
gets re-instated (signal_received = SIGTERM) when the notification
completes and then we'll exit the next time the signal is checked.

This patch handles only the period when exit-notify is ongoing so a SIGTERM
is impending.


> My original idea was "the place that sets signal_received will not do
> that for SIGUSR1 if a SIGTERM has already been received" (so, de-coupled
> from the exit notification timer), but it seems that this is not easy
> to do.  So maybe your patch is the pragmatic way to solve this.
>

We still have to do something like that but as you say its hard as the
signal is set in different ways -- through throw_signal, register_signal,
directly over-writing signal_received etc.. And no solution is going to be
perfect as checking and setting the signal needs two operations during
which a hard signal may be delivered. But that is something we can live
with, I suppose -- I do not mind having to press ctrl-c twice to get
noticed. But we should make sure soft signals (those internally generated
or received through the management interface) are never lost or
over-written by lower priority signals.

Selva

Reply via email to