Hi Vincent & all - FWIW I believe your analysis of the problem with
utempter is correct, I saw the same problem in tmux and exactly the change
you are suggesting (an explicit kill(getpid(), SIGCHLD)) completely fixed
the problem both then and since.

You are also correct that GotSigChld (and any other globals accessed by
signal handlers) should be typed "volatile sigatomic_t" to avoid signal
races. AFAIK this is the same as "volatile int" on any platforms you are
likely to see.


On Thu, 3 Feb 2022, 01:48 Vincent Lefèvre, <invalid.nore...@gnu.org> wrote:

> Follow-up Comment #25, bug #25089 (project screen):
>
> [comment #24 comment #24 :]
> > It doesn't seem to depend on the OS, so far it seems to depend on two
> things:
> >
> > a) using libutempter (a compile time decision by the configure script),
> and
> > b) using a slow computer or a VM.
>
> This is actually more complex. To reproduce the issue, the SIGCHLD needs
> to be
> received while the action has been set to SIG_DFL by libutempter, basically
> while the libutempter helper is running. I had noticed that reducing or
> disabling the compiler optimizations for screen made the zombies less
> likely
> to appear on my VM. And this probably depends very much on the hardware
> (e.g.,
> on my laptop, I recently noticed that a race condition in the display
> manager
> triggered an issue only after I changed the SSD disk, just because the new
> SSD
> disk is faster).
>
> > The exact part which "needs" to be slow on a host to trigger the race
> condition or why it shows up especially on VMs is though unclear to me. I
> was
> so far unable to reproduce this on neither a Xen VM nor on a KVM (ProxMox)
> based VM nor on real hardware.
>
> My VM (at Gandi) is based on Xen. And the issue is 100% reproducible.
>
> I've just tried to reproduce the issue on my laptop by adding a "sleep(1);"
> after the SIGCHLD action has been set to SIG_DFL in libutempter (iface.c),
> but
> I couldn't. I suppose that the SIGCHLD arrives earlier. I don't know which
> stress-ng options could be used to have it delayed ("--all 1" completely
> freezes my machine).
>
>     _______________________________________________________
>
> Reply to this item at:
>
>   <https://savannah.gnu.org/bugs/?25089>
>
> _______________________________________________
>   Message posté via Savannah
>   https://savannah.gnu.org/
>
>
>

Reply via email to