Hi Takashi,

On Dec  5 10:14, Takashi Yano wrote:
> Hi Corinna,
> 
> On Wed,  4 Dec 2024 13:54:47 +0100
> Corinna Vinschen wrote:
> > From: Corinna Vinschen <cori...@vinschen.de>
> > 
> > Commit 0b6fbd396ca2f ("* exceptions.cc (_cygtls::interrupt_now): Revert
> > to checking for "spinning" when choosing to defer signal.") introduced
> > a bug in the loop inside the stabilize_sig_stack subroutine:
> > 
> > First, stabilize_sig_stack grabs the stacklock. The _cygtls::incyg
> > flag is then incremented before checking if a signal has to be handled
> > for the current thread.
> > 
> > If no signal waits, the code simply jumps out, decrements _cygtls::incyg
> > and returns to the caller, which eventually releases the stacklock.
> > 
> > However, if a signal is waiting, stabilize_sig_stack releases the
> > stacklock, calls _cygtls::call_signal_handler(), and returns to
> > the start of the subroutine, trying to grab the lock.
> > 
> > After grabbing the lock, it increments _cygtls::incyg... wait...
> > again?
> > 
> > The loop does not decrement _cygtls::incyg after
> > _cygtls::call_signal_handler(), which returns with _cygtls::incyg
> > set to 1.  So it increments incyg to 2.  If no other signal is
> > waiting, stabilize_sig_stack jumps out and decrements _cygtls::incyg
> > to 1.  Eventually, setjmp or longjmp both will return to user
> > code with _cygtls::incyg set to 1.  This *may* be fixed at some later
> > point when signals arrive, but there will be a time when the application
> > runs in user code with broken signal handling.
> > 
> > Fixes: 0b6fbd396ca2f ("* exceptions.cc (_cygtls::interrupt_now): Revert to 
> > checking for "spinning" when choosing to defer signal.")
> > Signed-off-by: Corinna Vinschen <cori...@vinschen.de>
> > ---
> >  winsup/cygwin/scripts/gendef | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/winsup/cygwin/scripts/gendef b/winsup/cygwin/scripts/gendef
> > index 7e14f69cf71c..377ceb59b2c8 100755
> > --- a/winsup/cygwin/scripts/gendef
> > +++ b/winsup/cygwin/scripts/gendef
> > @@ -344,6 +344,7 @@ stabilize_sig_stack:
> >     movq    \$_cygtls.start_offset,%rcx     # point to beginning
> >     addq    %r12,%rcx                       #  of tls block
> >     call    _ZN7_cygtls19call_signal_handlerEv
> > +   decl    _cygtls.incyg(%r12)
> >     jmp     1b
> >  3: decl    _cygtls.incyg(%r12)
> >     addq    \$0x20,%rsp
> > -- 
> > 2.47.0
> > 
> 
> I tested this patch with Christian's longjmp test case, but
> the problem does not seem to be fixed.

That was not the intention.  It was just something I found while looking
into the assembler code.  This looks like a long-standing bug, which,
if my description above is right, might result in threads missing
signals, too.  Or at least, signals being defered, because user-space
is running partially with the "iscyg" flag being set.

It would be nice if you could check this, too, to be sure I'm not
totally wrong here.

> However, if additional patch attached as well as this patch are
> applied, the problem does not happen anymore. The additional
> patch removes the spinning flag completely.
> 
> What do you think?

Yeah, that's missing yet.  We don't have any documentation what exactly
the spinning flag is supposed to accomplish.  There's just this little
comment in _cygtls::interrupt_now:

  Delay the interrupt if we are
  [...]
  2) in _sigfe (spinning is true) and about to enter cygwin DLL

Yeah, nice... but *why*?  The fact that we're inside the DLL is guarded
by the "iscyg" flag anyway.  So what does "spinning" really accomplish,
which "incyg" doesn't?

I think for fixing this issue we can try to remove setting the spinning
flag from stabilize_sig_stack.  setjmp/longjmp only enter the DLL in
case of a waiting signal and then only via call_signal_handler(), not
via some exported entry point.  "incyg" should be really sufficient for
that.  This we could cherry-pick into 3.5.5.

In case of _sigfe/_sigbe, I'm not so sure yet. 3.6 might take some
more time anyway, so we could do a followup patch along the lines
of your patch below and see what happens.  I really hope we don't
get another, hard to debug fallout there...


Thanks,
Corinna

> 
> -- 
> Takashi Yano <takashi.y...@nifty.ne.jp>

> diff --git a/winsup/cygwin/cygtls.cc b/winsup/cygwin/cygtls.cc
> index 2842c2733..bfaa19867 100644
> --- a/winsup/cygwin/cygtls.cc
> +++ b/winsup/cygwin/cygtls.cc
> @@ -81,7 +81,7 @@ _cygtls::fixup_after_fork ()
>        pop ();
>        current_sig = 0;
>      }
> -  stacklock = spinning = 0;
> +  stacklock = 0;
>    signal_arrived = NULL;
>    locals.select.sockevt = NULL;
>    locals.cw_timer = NULL;
> diff --git a/winsup/cygwin/exceptions.cc b/winsup/cygwin/exceptions.cc
> index 35a4a0b47..4dc4be278 100644
> --- a/winsup/cygwin/exceptions.cc
> +++ b/winsup/cygwin/exceptions.cc
> @@ -920,9 +920,8 @@ _cygtls::interrupt_now (CONTEXT *cx, siginfo_t& si, void 
> *handler,
>  
>    /* Delay the interrupt if we are
>       1) somehow inside the DLL
> -     2) in _sigfe (spinning is true) and about to enter cygwin DLL
> -     3) in a Windows DLL.  */
> -  if (incyg || spinning || inside_kernel (cx))
> +     2) in a Windows DLL.  */
> +  if (incyg || inside_kernel (cx))
>      interrupted = false;
>    else
>      {
> diff --git a/winsup/cygwin/local_includes/cygtls.h 
> b/winsup/cygwin/local_includes/cygtls.h
> index efbd557b1..2d490646a 100644
> --- a/winsup/cygwin/local_includes/cygtls.h
> +++ b/winsup/cygwin/local_includes/cygtls.h
> @@ -196,7 +196,6 @@ public: /* Do NOT remove this public: line, it's a marker 
> for gentls_offsets. */
>    waitq wq;
>    int current_sig;
>    unsigned incyg;
> -  unsigned spinning;
>    volatile unsigned stacklock;
>    __tlsstack_t *stackptr;
>    __tlsstack_t stack[TLS_STACK_SIZE];
> diff --git a/winsup/cygwin/scripts/gendef b/winsup/cygwin/scripts/gendef
> index 377ceb59b..521550175 100755
> --- a/winsup/cygwin/scripts/gendef
> +++ b/winsup/cygwin/scripts/gendef
> @@ -134,7 +134,6 @@ _sigfe:                                           # stack 
> is aligned on entry!
>       movq    %gs:8,%r10                      # location of bottom of stack
>  1:   movl    \$1,%r11d
>       xchgl   %r11d,_cygtls.stacklock(%r10)   # try to acquire lock
> -     movl    %r11d,_cygtls.spinning(%r10)    # flag if we are waiting for 
> lock
>       testl   %r11d,%r11d                     # it will be zero
>       jz      2f                              #  if so
>       pause
> @@ -158,7 +157,6 @@ _sigbe:                                           # 
> return here after cygwin syscall
>       movq    %gs:8,%r10                      # address of bottom of tls
>  1:   movl    \$1,%r11d
>       xchgl   %r11d,_cygtls.stacklock(%r10)   # try to acquire lock
> -     movl    %r11d,_cygtls.spinning(%r10)    # flag if we are waiting for 
> lock
>       testl   %r11d,%r11d                     # it will be zero
>       jz      2f                              #  if so
>       pause
> @@ -258,7 +256,6 @@ sigdelayed:
>  
>  1:   movl    \$1,%r11d
>       xchgl   %r11d,_cygtls.stacklock(%r12)   # try to acquire lock
> -     movl    %r11d,_cygtls.spinning(%r12)    # flag if we are waiting for 
> lock
>       testl   %r11d,%r11d                     # it will be zero
>       jz      2f                              #  if so
>       pause
> @@ -332,7 +329,6 @@ stabilize_sig_stack:
>       movq    %gs:8,%r12
>  1:   movl    \$1,%r10d
>       xchgl   %r10d,_cygtls.stacklock(%r12)   # try to acquire lock
> -     movl    %r10d,_cygtls.spinning(%r12)    # flag if we are waiting for 
> lock
>       testl   %r10d,%r10d
>       jz      2f
>       pause

Reply via email to