On 11/3/2023 1:33 PM, Daniel P. Berrangé wrote: > On Fri, Nov 03, 2023 at 09:01:29AM -0700, Steve Sistare wrote: >> Buffered monitor output is lost when abort() is called. The pattern >> error_report() followed by abort() occurs about 60 times, so valuable >> information is being lost when the abort is called in the context of a >> monitor command. > > I'm curious, was there a particular abort() scenario that you hit ?
Yes, while tweaking the suspended state, and forgetting to add transitions: error_report("invalid runstate transition: '%s' -> '%s'", abort(); But I have previously hit this for other errors. > For some crude statistics: > > $ for i in abort return exit goto ; do echo -n "$i: " ; git grep --after 1 > error_report | grep $i | wc -l ; done > abort: 47 > return: 512 > exit: 458 > goto: 177 > > to me those numbers say that calling "abort()" after error_report > should be considered a bug, and we can blanket replace all the > abort() calls with exit(EXIT_FAILURE), and thus avoid the need to > special case flushing the monitor. And presumably add an atexit handler to flush the monitor ala monitor_abort. AFAICT currently no destructor is called for the monitor at exit time. > Also I think there's a decent case to be made for error_report() > to call monitor_flush(). A good start, but that would not help for monitors with skip_flush=true, which need to format the buffered string in a json response, which is the case I tripped over. >> To fix, install a SIGABRT handler to flush the monitor buffer to stderr. >> >> Signed-off-by: Steve Sistare <steven.sist...@oracle.com> >> --- >> monitor/monitor.c | 38 ++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 38 insertions(+) >> >> diff --git a/monitor/monitor.c b/monitor/monitor.c >> index dc352f9..65dace0 100644 >> --- a/monitor/monitor.c >> +++ b/monitor/monitor.c >> @@ -701,6 +701,43 @@ void monitor_cleanup(void) >> } >> } >> >> +#ifdef CONFIG_LINUX >> + >> +static void monitor_abort(int signal, siginfo_t *info, void *c) >> +{ >> + Monitor *mon = monitor_cur(); >> + >> + if (!mon || qemu_mutex_trylock(&mon->mon_lock)) { >> + return; >> + } >> + >> + if (mon->outbuf && mon->outbuf->len) { >> + fputs("SIGABRT received: ", stderr); >> + fputs(mon->outbuf->str, stderr); >> + if (mon->outbuf->str[mon->outbuf->len - 1] != '\n') { >> + fputc('\n', stderr); >> + } >> + } >> + >> + qemu_mutex_unlock(&mon->mon_lock); > > The SIGABRT handling does not only fire in response to abort() > calls, but also in response to bad memory scenarios, so we have > to be careful what we do in signal handlers. > > In particular using mutexes in signal handlers is a big red > flag generally. Mutex APIs are not declare async signal > safe, so this code is technically a POSIX compliance > violation. Righto. I would need to mask all signals in the sigaction to be on the safe(r) side. > So I think we'd be safer just eliminating the explicit abort() > calls and adding monitor_flush call to error_report. I like adding a handler because it is future proof. No need to play whack-a-mole when developers re-introduce abort() calls in the future. A minor benefit is I would not need ack's from 50 maintainers to change 50 call sites from abort to exit. A slight risk of the exit solution is that something bad happened at the call site, so qemu state can no longer be trusted. Calling abort immediately may be safer than calling exit which will call the existing atexit handlers and could have side effects. A third option is to define qemu_abort() which flushes the monitor, and replaces all abort calls. That avoids async-signal-mutex hand wringing, but is still subject to whack-a-mole. So: atexit, signal handler, or qemu_abort? I will go with your preference. - Steve >> +} >> + >> +static void monitor_add_abort_handler(void) >> +{ >> + struct sigaction act; >> + >> + memset(&act, 0, sizeof(act)); >> + act.sa_sigaction = monitor_abort; >> + act.sa_flags = SA_SIGINFO; >> + sigaction(SIGABRT, &act, NULL); >> +} >> + >> +#else >> + >> +static void monitor_add_abort_handler(void) {} >> + >> +#endif >> + >> static void monitor_qapi_event_init(void) >> { >> monitor_qapi_event_state = g_hash_table_new(qapi_event_throttle_hash, >> @@ -712,6 +749,7 @@ void monitor_init_globals(void) >> monitor_qapi_event_init(); >> qemu_mutex_init(&monitor_lock); >> coroutine_mon = g_hash_table_new(NULL, NULL); >> + monitor_add_abort_handler(); >> >> /* >> * The dispatcher BH must run in the main loop thread, since we >> -- >> 1.8.3.1 >> >> > > With regards, > Daniel