On Tue, Jun 7, 2016 at 3:06 AM, Gert Doering <g...@greenie.muc.de> wrote:
> Hi, > > On Tue, Jun 07, 2016 at 12:44:20AM -0400, Selva Nair wrote: > > This allows exit notification to complete and finally trigger SIGTERM. > > The current practice of allowing a restart in this state clears > > the exit notification timer data and thus loses the SIGTERM. > > This is along the lines I was thinking as well, so I like it better than > the first try (which was sort of fiddling with the symptoms, but not with > the underlying signal handling problem). > > What if SIGTERM+SIGUSR1 are received in very quick succession, so that > we haven't even entered "explicit_exit_notification_interval" yet? > This patch only addresses the issue of restart if exit notify is ongoing. The second problem of signals getting overwritten before getting processed has to be addressed independently. I am not yet sure how best to do this as there are many places and many ways this can happen. > > I'm not sure if I understand the full chain of events that happen on > a (soft or hard) SIGTERM, what state ->signal_received has when the > exit notification timer is active, and under which circumstances > get_signal() is called to collect signal info into the context... this > is magic stuff... > In most cases, the signal gets noticed through the EVENT_LOOP_CHECK_SIG (same as P2P_CHECK_SIG) which breaks the loop or through 'if (IS_SIG) return'. When exit notification is ongoing, signal_received is zero: SIGTERM received --> processed --> removed and exit-notify started. This is needed to make sure we don't exit until the notification is completed. The SIGTERM gets re-instated (signal_received = SIGTERM) when the notification completes and then we'll exit the next time the signal is checked. This patch handles only the period when exit-notify is ongoing so a SIGTERM is impending. > My original idea was "the place that sets signal_received will not do > that for SIGUSR1 if a SIGTERM has already been received" (so, de-coupled > from the exit notification timer), but it seems that this is not easy > to do. So maybe your patch is the pragmatic way to solve this. > We still have to do something like that but as you say its hard as the signal is set in different ways -- through throw_signal, register_signal, directly over-writing signal_received etc.. And no solution is going to be perfect as checking and setting the signal needs two operations during which a hard signal may be delivered. But that is something we can live with, I suppose -- I do not mind having to press ctrl-c twice to get noticed. But we should make sure soft signals (those internally generated or received through the management interface) are never lost or over-written by lower priority signals. Selva