On 01/22/21 11:20, Max Reitz wrote: > Modifying signal handlers is a process-global operation. When two > threads run coroutine-sigaltstack's qemu_coroutine_new() concurrently, > they may interfere with each other: One of them may revert the SIGUSR2 > handler back to the default between the other thread setting up > coroutine_trampoline() as the handler and raising SIGUSR2. That SIGUSR2 > will then lead to the process exiting. > > Outside of coroutine-sigaltstack, qemu does not use SIGUSR2. We can > thus keep the signal handler installed all the time. > CoroutineThreadState.tr_handler tells coroutine_trampoline() whether its > stack is set up so a new coroutine is to be launched (i.e., it should > invoke sigsetjmp()), or not (i.e., the signal came from an external > source and we should just perform the default action, which is to exit > the process). > > Note that in user-mode emulation, the guest can register signal handlers > for any signal but SIGSEGV and SIGBUS, so if it registers a SIGUSR2 > handler, sigaltstack coroutines will break from then on. However, we do > not use coroutines for user-mode emulation, so that is fine. > > Suggested-by: Laszlo Ersek <ler...@redhat.com> > Signed-off-by: Max Reitz <mre...@redhat.com> > --- > util/coroutine-sigaltstack.c | 56 +++++++++++++++++++----------------- > 1 file changed, 29 insertions(+), 27 deletions(-) > > diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c > index aade82afb8..2d32afc322 100644 > --- a/util/coroutine-sigaltstack.c > +++ b/util/coroutine-sigaltstack.c > @@ -59,6 +59,8 @@ typedef struct { > > static pthread_key_t thread_state_key; > > +static void coroutine_trampoline(int signal); > + > static CoroutineThreadState *coroutine_get_thread_state(void) > { > CoroutineThreadState *s = pthread_getspecific(thread_state_key); > @@ -80,6 +82,7 @@ static void qemu_coroutine_thread_cleanup(void *opaque) > > static void __attribute__((constructor)) coroutine_init(void) > { > + struct sigaction sa; > int ret; > > ret = pthread_key_create(&thread_state_key, > qemu_coroutine_thread_cleanup); > @@ -87,6 +90,20 @@ static void __attribute__((constructor)) > coroutine_init(void) > fprintf(stderr, "unable to create leader key: %s\n", > strerror(errno)); > abort(); > } > + > + /* > + * Establish the SIGUSR2 signal handler. This is a process-wide > + * operation, and so will apply to all threads from here on. > + */ > + sa = (struct sigaction) { > + .sa_handler = coroutine_trampoline, > + .sa_flags = SA_ONSTACK, > + }; > + > + if (sigaction(SIGUSR2, &sa, NULL) != 0) { > + perror("Unable to install SIGUSR2 handler"); > + abort(); > + } > } > > /* "boot" function > @@ -121,7 +138,17 @@ static void coroutine_trampoline(int signal) > /* Get the thread specific information */ > coTS = coroutine_get_thread_state(); > self = coTS->tr_handler; > + > + if (!self) { > + /* > + * This SIGUSR2 came from an external source, not from > + * qemu_coroutine_new(), so perform the default action. > + */ > + exit(0); > + } > + > coTS->tr_called = 1; > + coTS->tr_handler = NULL; > co = &self->base; > > /*
(8) There's a further complication here, assuming we really want to recognize the case when the handler is executing unexpectedly: - pthread_getspecific() is not necessarily async-signal-safe, according to POSIX, so calling coroutine_get_thread_state() in the "unexpected" case (e.g. in response to an asynchronously generated SIGUSR2) is problematic in its own right, - if the SIGUSR2 is delivered to a thread that has never called coroutine_get_thread_state() before, then we'll reach g_malloc0() inside coroutine_get_thread_state(), in signal handler context, which is very bad. You'd have to block SIGUSR2 for the entire process (all threads) at all times, and only temporarily unblock it for a particular coroutine thread, with the sigsuspend(). The above check would suffice, that way. Such blocking is possible by calling pthread_sigmask() from the main thread, before any other thread is created (the signal mask is inherited across pthread_create()). I guess it could be done in coroutine_init() too. And *then* the pthread_sigmask() calls should indeed be removed from qemu_coroutine_new(). (Apologies if my feedback is difficult to understand, it's my fault. I could propose a patch, if (and only if) you want that.) Thanks Laszlo > @@ -150,12 +177,9 @@ Coroutine *qemu_coroutine_new(void) > { > CoroutineSigAltStack *co; > CoroutineThreadState *coTS; > - struct sigaction sa; > - struct sigaction osa; > stack_t ss; > stack_t oss; > sigset_t sigs; > - sigset_t osigs; > sigjmp_buf old_env; > > /* The way to manipulate stack is with the sigaltstack function. We > @@ -172,24 +196,6 @@ Coroutine *qemu_coroutine_new(void) > co->stack = qemu_alloc_stack(&co->stack_size); > co->base.entry_arg = &old_env; /* stash away our jmp_buf */ > > - coTS = coroutine_get_thread_state(); > - coTS->tr_handler = co; > - > - /* > - * Preserve the SIGUSR2 signal state, block SIGUSR2, > - * and establish our signal handler. The signal will > - * later transfer control onto the signal stack. > - */ > - sigemptyset(&sigs); > - sigaddset(&sigs, SIGUSR2); > - pthread_sigmask(SIG_BLOCK, &sigs, &osigs); > - sa.sa_handler = coroutine_trampoline; > - sigfillset(&sa.sa_mask); > - sa.sa_flags = SA_ONSTACK; > - if (sigaction(SIGUSR2, &sa, &osa) != 0) { > - abort(); > - } > - > /* > * Set the new stack. > */ > @@ -207,6 +213,8 @@ Coroutine *qemu_coroutine_new(void) > * signal can be delivered the first time sigsuspend() is > * called. > */ > + coTS = coroutine_get_thread_state(); > + coTS->tr_handler = co; > coTS->tr_called = 0; > pthread_kill(pthread_self(), SIGUSR2); > sigfillset(&sigs); > @@ -230,12 +238,6 @@ Coroutine *qemu_coroutine_new(void) > sigaltstack(&oss, NULL); > } > > - /* > - * Restore the old SIGUSR2 signal handler and mask > - */ > - sigaction(SIGUSR2, &osa, NULL); > - pthread_sigmask(SIG_SETMASK, &osigs, NULL); > - > /* > * Now enter the trampoline again, but this time not as a signal > * handler. Instead we jump into it directly. The functionally >