On Thu, 13 Mar 2025 20:42:52 +0900
Takashi Yano wrote:
> Hi Corinna,
> 
> On Thu, 13 Mar 2025 10:40:48 +0100
> Christian Franke wrote:
> > Corinna Vinschen via Cygwin wrote:
> > > On Mar 12 17:06, Corinna Vinschen via Cygwin wrote:
> > >> On Mar 12 16:30, Corinna Vinschen via Cygwin wrote:
> > >>> On Mar 11 12:32, Christian Franke via Cygwin wrote:
> > >>>> The attached testcase should test the following use cases of 
> > >>>> setcontext:
> > >>>> - call from regular user space
> > >>>> - call from a signal handler interrupting user space
> > >>>> - call from a signal handler interrupting a system call
> > >>>>
> > >>>> It works as expected ... until the signal count reaches 256. Then 
> > >>>> signals
> > >>>> are again only delivered from inside of a system call.
> > >>>> [...]
> > >>>> Interesting... Hmm... is there some 8-bit counter which overflows and 
> > >>>> then
> > >>>> stucks at 0xff or 0x00?
> > >>> It's a kind of stack overflow.  Kind of, because it's not the normal
> > >>> thread stack, but a special signal stack in the _cygtls area.
> > >>>
> > >>> When interrupting a running thread to call a signal handler, the context
> > >>> of the thread is changed to restart execution in an assembler function
> > >>> called sigdelayed().  The original IP of the thread is pushed on the
> > >>> aforementioned signal stack.  Sigdelayed() calls the signal handler.  On
> > >>> return it pops the original IP from the signal stack and continues the
> > >>> thread.
> > >>>
> > >>> Now guess what happens if the signal handler bails out with longjmp or
> > >>> setcontext/swapcontext.
> > >>>
> > >>> The signal handler never returns to the sigdelayed() function, the
> > >>> original address is never poped from the signal stack, and the signal
> > >>> stack has a max. size of 256 address entries...
> > >>>
> > >>> Theoretically, a small update to sigdelayed() would fix the issue: ather
> > >>> then poing the original IP from the signal stack after calling the
> > >>> handler, it should pop the IP prior to calling the handler.  That would
> > >>> avoid filling up the signal stack when long-jumping out of the signal
> > >>> handler.  It should store the IP in one of the callee-saved registers.
> > >>> %r13 is unused in sigdelayed so far.
> > >>>
> > >>> However, even if we do this, there's still the problem that sigdelayed()
> > >>> itself takes space on the stack.  If you longjmp/setcontext out of the
> > >>> handler, the thread's normal stack will fill up with dead storage of the
> > >>> sigdelayed() function, and there's no way out of this trap.  We can't
> > >>> restore the stack before the handler returns.
> > >>>
> > >>> So either way, at one point you get a stack overflow one way or the
> > >>> other.
> > >>>
> > >>> The signal stack overflow is actually rather harmless in comparison
> > >>> to a real stack overflow.
> > >>>
> > >>> If you have any idea how to avoid the real stack overflow, I'd be
> > >>> all ears.
> > >> Looks like this isn't really a problem with setcontext.  It always
> > >> corrects the stack pointer as well.  Apparently I haven't thought
> > >> long enough about this.
> > >>
> > >> I have a patch for sigdelayed() in the loop, stay tuned.
> > > Just pushed.  Try cygwin-3.6.0-0.430.ga942476236b5 in a bit.
> > 
> > Problem does no longer occur. Also tested with 'kill -INT PID && sleep 
> > 0.01' in a loop.
> 
> After the commit:
> 
> commit a942476236b5e39bf30c533d08df7392e326a4c6 (origin/master, origin/main, 
> origin/HEAD)
> Author: Corinna Vinschen <cori...@vinschen.de>
> Date:   Wed Mar 12 17:17:31 2025 +0100
> 
>     Cygwin: sigdelayed: pop return address from signal stack earlier
> 
> Christians test case: timersig.c no longer works even with my v3 patches.
> I suspect it is because pop(), retaddr() are not working as intended in
> call_signal_handler() with this commit.
> 
> Could you please have a look?


What about following patch instead of your sigdelayed patch?

diff --git a/winsup/cygwin/exceptions.cc b/winsup/cygwin/exceptions.cc
index c9fe6a386..ceb47e52e 100644
--- a/winsup/cygwin/exceptions.cc
+++ b/winsup/cygwin/exceptions.cc
@@ -1758,6 +1758,13 @@ _cygtls::call_signal_handler ()
       reset_signal_arrived ();
       incyg = false;
       current_sig = 0; /* Flag that we can accept another signal */
+
+      /* We have to fetch the original return address from the signal stack
+        prior to calling the signal handler.  This avoids filling up the
+        signal stack if the signal handler longjumps (longjmp/setcontext). */
+      DWORD64 retaddr1 = pop ();
+      DWORD64 retaddr2 = stackptr > stack ? retaddr () : 0;
+      __tlsstack_t *ptr = stackptr;
       unlock ();       /* unlock signal stack */
 
       /* Alternate signal stack requested for this signal and alternate signal
@@ -1834,6 +1841,26 @@ _cygtls::call_signal_handler ()
           signal handler. */
        thisfunc (thissig, &thissi, thiscontext);
 
+      lock ();
+      if (stackptr == ptr)
+       push (retaddr1);
+      else if (stackptr == ptr + 1)
+       {
+         DWORD64 retaddr3 = pop();
+         push (retaddr1);
+         push (retaddr3);
+       }
+      else if (stackptr == ptr - 1)
+       {
+         if (retaddr2)
+           push (retaddr2);
+         else
+           stackptr++;
+       }
+      else
+       api_fatal ("Signal stack corrupted?.");
+      unlock ();
+
       incyg = true;
 
       set_signal_mask (_my_tls.sigmask, (this_sa_flags & SA_SIGINFO)

-- 
Takashi Yano <takashi.y...@nifty.ne.jp>

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to