On 01/22/21 18:58, Max Reitz wrote: > On 22.01.21 17:38, Laszlo Ersek wrote: >> On 01/22/21 11:20, Max Reitz wrote: >>> Modifying signal handlers is a process-global operation. When two >>> threads run coroutine-sigaltstack's qemu_coroutine_new() concurrently, >>> they may interfere with each other: One of them may revert the SIGUSR2 >>> handler back to the default between the other thread setting up >>> coroutine_trampoline() as the handler and raising SIGUSR2. That SIGUSR2 >>> will then lead to the process exiting. >>> >>> Outside of coroutine-sigaltstack, qemu does not use SIGUSR2. We can >>> thus keep the signal handler installed all the time. >>> CoroutineThreadState.tr_handler tells coroutine_trampoline() whether its >>> stack is set up so a new coroutine is to be launched (i.e., it should >>> invoke sigsetjmp()), or not (i.e., the signal came from an external >>> source and we should just perform the default action, which is to exit >>> the process). >>> >>> Note that in user-mode emulation, the guest can register signal handlers >>> for any signal but SIGSEGV and SIGBUS, so if it registers a SIGUSR2 >>> handler, sigaltstack coroutines will break from then on. However, we do >>> not use coroutines for user-mode emulation, so that is fine. >>> >>> Suggested-by: Laszlo Ersek <ler...@redhat.com> >>> Signed-off-by: Max Reitz <mre...@redhat.com> >>> --- >>> util/coroutine-sigaltstack.c | 56 +++++++++++++++++++----------------- >>> 1 file changed, 29 insertions(+), 27 deletions(-) >> >> (1) With SIGUSR2 permanently dedicated to "coroutine-sigaltstack.c", the >> comment on the qemu_init_main_loop() declaration, in >> "include/qemu/main-loop.h", also needs to be updated. SIGUSR2 is no >> longer a "free for thread-internal usage" signal. > > It’s possible that I haven’t understood that comment, but I haven’t > adjusted it because I didn’t interpret it to mean that the signals > listed there were free to use. For example, SIGUSR1 (aliased to > SIG_IPI) is generated by cpus_kick_thread() and caught by KVM and HVF, > so it doesn’t seem like it would be free for thread-internal usage either. > > Instead, I understood it to mean that the signals listed there do not > have to be blocked by non-main threads, because it is OK for non-main > threads to catch them. I can’t think of a reason why SIGUSR2 should be > blocked by any thread, so I decided not to change the comment. > > But perhaps I just didn’t understand the whole comment. That’s > definitely possible, given that I don’t even see a place where non-main > threads would block the signals not listed there.
OK, then I agree that I don't understand the comment on qemu_init_main_loop() either. :) If you consciously decided not to change the comment, then I won't request otherwise. > >>> diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c >>> index aade82afb8..2d32afc322 100644 >>> --- a/util/coroutine-sigaltstack.c >>> +++ b/util/coroutine-sigaltstack.c >>> @@ -59,6 +59,8 @@ typedef struct { >>> >>> static pthread_key_t thread_state_key; >>> >>> +static void coroutine_trampoline(int signal); >>> + >>> static CoroutineThreadState *coroutine_get_thread_state(void) >>> { >>> CoroutineThreadState *s = pthread_getspecific(thread_state_key); >>> @@ -80,6 +82,7 @@ static void qemu_coroutine_thread_cleanup(void >>> *opaque) >>> >>> static void __attribute__((constructor)) coroutine_init(void) >>> { >>> + struct sigaction sa; >>> int ret; >>> >>> ret = pthread_key_create(&thread_state_key, >>> qemu_coroutine_thread_cleanup); >>> @@ -87,6 +90,20 @@ static void __attribute__((constructor)) >>> coroutine_init(void) >>> fprintf(stderr, "unable to create leader key: %s\n", >>> strerror(errno)); >>> abort(); >>> } >>> + >>> + /* >>> + * Establish the SIGUSR2 signal handler. This is a process-wide >>> + * operation, and so will apply to all threads from here on. >>> + */ >>> + sa = (struct sigaction) { >>> + .sa_handler = coroutine_trampoline, >>> + .sa_flags = SA_ONSTACK, >>> + }; >>> + >>> + if (sigaction(SIGUSR2, &sa, NULL) != 0) { >>> + perror("Unable to install SIGUSR2 handler"); >>> + abort(); >>> + } >>> } >>> >>> /* "boot" function >>> @@ -121,7 +138,17 @@ static void coroutine_trampoline(int signal) >>> /* Get the thread specific information */ >>> coTS = coroutine_get_thread_state(); >>> self = coTS->tr_handler; >>> + >>> + if (!self) { >>> + /* >>> + * This SIGUSR2 came from an external source, not from >>> + * qemu_coroutine_new(), so perform the default action. >>> + */ >>> + exit(0); >>> + } >> >> (2) exit() is generally unsafe to call in signal handlers. >> >> We could reason whether or not it is safe in this particular case (POSIX >> describes the exact conditions -- >> <https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03>), >> >> but it's much simpler to just call _exit(). >> >> >> (3) "Performing the default action" would be slightly different from >> calling _exit(). When a process is terminated with a signal, the parent >> can distinguish that, when reaping the child. See waitpid() / >> WIFSIGNALED() / WTERMSIG(), versus WIFEXITED() / WEXITSTATUS(). >> >> So for the "default action", we'd have to: >> - restore the SIGUSR2 handler to SIG_DFL, and >> - re-raise the signal for the thread, and >> - because the signal being handled is automatically blocked unless >> SA_NODEFER was set: unblock the signal for the thread. >> >> The pthread_sigmask() call, made for the last step, would be the one >> that would not return. >> >> *However*, all of this complexity is not really useful here. We don't >> really want to simulate being "forcefully terminated" by the unexpected >> (asynchronous) SIGUSR2. We just want to exit. >> >> Therefore, _exit() is fine. But, we should use an exit code different >> from 0, as this is definitely not a pristine exit from QEMU. I'm not >> sure if a convention exists for nonzero exit codes in QEMU; if not, then >> just pass EXIT_FAILURE to _exit(). > > I’m fine with calling _exit(). I hope, Eric is, too (as long as the > comment no longer claims this were the default behavior). > > Given that SIGUSR2 is not something that qemu is prepared to receive > from the outside, EXIT_FAILURE seems right to me. (Even abort() could > be justified, but I don’t think generating a core dump does any good here.) > > (As for qemu-specific exit code conventions, the only ones I know of are > for certain qemu-img subcommands. I’m sure you grepped, too, but I > can’t find anything for the system emulator.) > >> (4) Furthermore, please update the comment, because "perform the default >> action" is not precise. > > Sure, that’s definitely easier than to re-raise SIGUSR2. > >>> + >>> coTS->tr_called = 1; >>> + coTS->tr_handler = NULL; >>> co = &self->base; >>> >>> /* >> >> (5) I see that "tr_called" has type "volatile sig_atomic_t", which is >> great (I think that "sig_atomic_t" is not even necessary here, because >> of the careful signal masking that we do, and because the signal is >> raised synchronously). >> >> "volatile" is certainly justified though (the compiler may not be able >> to trace the call through the signal), and that's what I'm missing from >> "tr_handler" too. IMO, the "tr_handler" field should be decalered as >> follow: >> >> volatile void * volatile tr_handler; >> >> wherein the second "volatile" serves just the same purpose as the >> "volatile" in "tr_called", and the first "volatile" follows from *that* >> -- wherever we chase the *chain of pointers* rooted in "tr_handler" >> should not be optimized out by the compiler. >> >> But: my point (5) is orthogonal to this patch. In practice, it may not >> matter at all. So feel free to ignore now, I guess. > > I suppose signal handlers are indeed preempting functions (i.e., the > compiler is not aware of them executing). I overlooked that, given that > logically, we more or less explicitly invoke the SIGUSR2 handler, but of > course, technically, we just trigger the signal and the handler is then > invoked preemptively. > > I suspect the pthread_kill() function is sufficiently complex that the > compiler can’t prove that it won’t access *coTS (e.g. because perhaps it > contains a syscall?), but better be safe than sorry. > > But I don’t really like the “volatile void *volatile tr_handler”, > particularly the “volatile void *” combinations. I find volatile voids > weird. > > Why is tr_handler even a void pointer, and not a pointer to > CoroutineSigAltStack, if all it can point to is such an object? > “volatile CoroutineSigAltStack *” would make more sense to me. > > Given that perhaps the CoroutineSigAltStack object should be volatile, > shouldn’t also the CoroutineThreadState object be volatile? If it were, > we could drop the volatile from tr_called and wouldn’t need to make the > pointer value tr_handler volatile. Good point... > > > OTOH, if I look more through the code, making everything volatile seems > a bit excessive to me. I understand correctly that > sigsetjmp()/siglongjmp() act as barriers to the compiler, don’t they? Hmmm... https://pubs.opengroup.org/onlinepubs/9699919799/functions/longjmp.html "I guess so". > > The only value that I can see the in-coroutine code writing that the > calling code reads (before the next sigsetjmp()) is tr_called. So > shouldn’t it be sufficient to insert a barrier() before the > pthread_kill(), and then it’d be sufficient to keep tr_called volatile, > but the rest could stay as it is? Well, I guess one could argue that pthread_kill() itself should work as a barrier. It's hard to find a reason why it should not be a barrier. > >>> @@ -150,12 +177,9 @@ Coroutine *qemu_coroutine_new(void) >>> { >>> CoroutineSigAltStack *co; >>> CoroutineThreadState *coTS; >>> - struct sigaction sa; >>> - struct sigaction osa; >>> stack_t ss; >>> stack_t oss; >>> sigset_t sigs; >>> - sigset_t osigs; >>> sigjmp_buf old_env; >>> >>> /* The way to manipulate stack is with the sigaltstack >>> function. We >>> @@ -172,24 +196,6 @@ Coroutine *qemu_coroutine_new(void) >>> co->stack = qemu_alloc_stack(&co->stack_size); >>> co->base.entry_arg = &old_env; /* stash away our jmp_buf */ >>> >>> - coTS = coroutine_get_thread_state(); >>> - coTS->tr_handler = co; >>> - >>> - /* >>> - * Preserve the SIGUSR2 signal state, block SIGUSR2, >>> - * and establish our signal handler. The signal will >>> - * later transfer control onto the signal stack. >>> - */ >>> - sigemptyset(&sigs); >>> - sigaddset(&sigs, SIGUSR2); >>> - pthread_sigmask(SIG_BLOCK, &sigs, &osigs); >>> - sa.sa_handler = coroutine_trampoline; >>> - sigfillset(&sa.sa_mask); >>> - sa.sa_flags = SA_ONSTACK; >>> - if (sigaction(SIGUSR2, &sa, &osa) != 0) { >>> - abort(); >>> - } >>> - >>> /* >>> * Set the new stack. >>> */ >>> @@ -207,6 +213,8 @@ Coroutine *qemu_coroutine_new(void) >>> * signal can be delivered the first time sigsuspend() is >>> * called. >>> */ >>> + coTS = coroutine_get_thread_state(); >>> + coTS->tr_handler = co; >>> coTS->tr_called = 0; >>> pthread_kill(pthread_self(), SIGUSR2); >>> sigfillset(&sigs); >>> @@ -230,12 +238,6 @@ Coroutine *qemu_coroutine_new(void) >>> sigaltstack(&oss, NULL); >>> } >>> >>> - /* >>> - * Restore the old SIGUSR2 signal handler and mask >>> - */ >>> - sigaction(SIGUSR2, &osa, NULL); >>> - pthread_sigmask(SIG_SETMASK, &osigs, NULL); >>> - >>> /* >>> * Now enter the trampoline again, but this time not as a signal >>> * handler. Instead we jump into it directly. The functionally >>> >> >> (6) This change, for qemu_coroutine_new(), is too heavy-handed, in my >> opinion. I suggest (a) removing only the sigaction() calls and their >> directly needed aux variables, and (b) *not* moving around operations. >> >> In particular, you remove the blocking of SIGUSR2 for the thread -- the >> pthread_sigmask() call. That means the sigsuspend() later on becomes >> superfluous, as the signal will not be delivered inside sigsuspend(), >> but inside pthread_kill(). With the signal not blocked, *generation* and >> *delivery* will coalesce into a single event. > > Are you saying that (a) is a problem because this is too heavy-handed of > a change for a single patch, or because it would actually be a problem > to deliver the signal inside pthread_kill()? The former: all the things together that the patch does to qemu_coroutine_new() are too heavy-weight for a single patch. They also render sigsuspend() useless, while keeping sigsuspend() in place. > > (Either way will result in me dropping the change from this patch, so > it’s just a question out of curiosity.) > > As for (b), just FYI, I deliberately moved around the operation, so that > tr_handler is only set once the coroutine stack is set up. Otherwise it > might be a problem if an external SIGUSR2 comes in before then. > > But if we keep SIGUSR2 blocked, there is no reason for that movement, > hence the “just FYI”. > >> The general logic should stay the same, only the signal action >> manipulation should be removed. IOW, for this function, I propose the >> following change only: >> >>> diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c >>> index aade82afb8c0..722fed7b2502 100644 >>> --- a/util/coroutine-sigaltstack.c >>> +++ b/util/coroutine-sigaltstack.c >>> @@ -149,107 +149,97 @@ static void coroutine_trampoline(int signal) >>> Coroutine *qemu_coroutine_new(void) >>> { >>> CoroutineSigAltStack *co; >>> CoroutineThreadState *coTS; >>> - struct sigaction sa; >>> - struct sigaction osa; >>> stack_t ss; >>> stack_t oss; >>> sigset_t sigs; >>> sigset_t osigs; >>> sigjmp_buf old_env; >>> >>> /* The way to manipulate stack is with the sigaltstack >>> function. We >>> * prepare a stack, with it delivering a signal to ourselves >>> and then >>> * put sigsetjmp/siglongjmp where needed. >>> * This has been done keeping coroutine-ucontext as a model and >>> with the >>> * pth ideas (GNU Portable Threads). See coroutine-ucontext for >>> the basics >>> * of the coroutines and see pth_mctx.c (from the pth project) >>> for the >>> * sigaltstack way of manipulating stacks. >>> */ >>> >>> co = g_malloc0(sizeof(*co)); >>> co->stack_size = COROUTINE_STACK_SIZE; >>> co->stack = qemu_alloc_stack(&co->stack_size); >>> co->base.entry_arg = &old_env; /* stash away our jmp_buf */ >>> >>> coTS = coroutine_get_thread_state(); >>> coTS->tr_handler = co; >>> >>> /* >>> - * Preserve the SIGUSR2 signal state, block SIGUSR2, >>> - * and establish our signal handler. The signal will >>> - * later transfer control onto the signal stack. >>> + * Block SIGUSR2. The signal will later transfer control onto >>> the signal >>> + * stack. >>> */ >>> sigemptyset(&sigs); >>> sigaddset(&sigs, SIGUSR2); >>> pthread_sigmask(SIG_BLOCK, &sigs, &osigs); >>> - sa.sa_handler = coroutine_trampoline; >>> - sigfillset(&sa.sa_mask); >>> - sa.sa_flags = SA_ONSTACK; >>> - if (sigaction(SIGUSR2, &sa, &osa) != 0) { >>> - abort(); >>> - } >>> >>> /* >>> * Set the new stack. >>> */ >>> ss.ss_sp = co->stack; >>> ss.ss_size = co->stack_size; >>> ss.ss_flags = 0; >>> if (sigaltstack(&ss, &oss) < 0) { >>> abort(); >>> } >>> >>> /* >>> * Now transfer control onto the signal stack and set it up. >>> * It will return immediately via "return" after the sigsetjmp() >>> * was performed. Be careful here with race conditions. The >>> * signal can be delivered the first time sigsuspend() is >>> * called. >>> */ >>> coTS->tr_called = 0; >>> pthread_kill(pthread_self(), SIGUSR2); >>> sigfillset(&sigs); >>> sigdelset(&sigs, SIGUSR2); >>> while (!coTS->tr_called) { >>> sigsuspend(&sigs); >>> } >>> >>> /* >>> * Inform the system that we are back off the signal stack by >>> * removing the alternative signal stack. Be careful here: It >>> * first has to be disabled, before it can be removed. >>> */ >>> sigaltstack(NULL, &ss); >>> ss.ss_flags = SS_DISABLE; >>> if (sigaltstack(&ss, NULL) < 0) { >>> abort(); >>> } >>> sigaltstack(NULL, &ss); >>> if (!(oss.ss_flags & SS_DISABLE)) { >>> sigaltstack(&oss, NULL); >>> } >>> >>> /* >>> - * Restore the old SIGUSR2 signal handler and mask >>> + * Restore the old mask >>> */ >>> - sigaction(SIGUSR2, &osa, NULL); >>> pthread_sigmask(SIG_SETMASK, &osigs, NULL); >>> >>> /* >>> * Now enter the trampoline again, but this time not as a signal >>> * handler. Instead we jump into it directly. The functionally >>> * redundant ping-pong pointer arithmetic is necessary to avoid >>> * type-conversion warnings related to the `volatile' qualifier >>> and >>> * the fact that `jmp_buf' usually is an array type. >>> */ >>> if (!sigsetjmp(old_env, 0)) { >>> siglongjmp(coTS->tr_reenter, 1); >>> } >>> >>> /* >>> * Ok, we returned again, so now we're finished >>> */ >>> >>> return &co->base; >>> } >> >> >> (7) The sigaction() call has not been moved entirely correctly from >> qemu_coroutine_new() to coroutine_init(), in my opinion. >> >> Namely, the original call site fills the "sa_mask" member, meaning that, >> while the signal handler is executing, not only SIGUSR2 itself should be >> blocked, but *all* signals. >> >> This is missing from the new call site, in coroutine_init() -- the >> "sa_mask" member is left zeroed. >> >> Now, in practice, this may not matter a whole lot, because "sa_mask" is >> additive, and at the only place we allow the signal to be delivered, >> namely in sigsuspend(), we already have everything blocked, except for >> SIGUSR2. So when "sa_mask" is ORed with the set of blocked signals, upon >> delivery of SIGUSR2, there is no actual change to the signal mask. >> >> *But*, I feel that such a change would really deserve its own argument, >> i.e. a separate patch, or at least a separate paragraph in the commit >> message. And I suggest not doing those; instead please just faithfully >> transfer the "sa_mask" setting too, to coroutine_init(). (My impression >> is that the effective removal of the "sa_mask" population was an >> oversight in this patch, not a conscious decision.) > > Yes, it was totally an oversight. > > Thanks a lot for your detailed review! I have absolutely no experience > with coroutine-sigaltstack in particular, and very little experience > with signal handling in projects where correctness actually matters. I’m > grateful that you inspect this patch with great scrutiny. > > (Btw, that’s why I was very close to just ending a new version of the > mutex patch just with the commit message adjusted, because that felt > like I could do much less wrong than here. Turns out I was right. O:)) TBH I started liking the mutex version too. :) It's possible we're working just too hard for solving this issue. Laszlo