On Wed, Nov 15, 2023 at 10:30:29AM -0500, Steven Sistare wrote: > > > On 11/6/2023 5:10 AM, Daniel P. Berrangé wrote: > > On Fri, Nov 03, 2023 at 03:51:00PM -0400, Steven Sistare wrote: > >> On 11/3/2023 1:33 PM, Daniel P. Berrangé wrote: > >>> On Fri, Nov 03, 2023 at 09:01:29AM -0700, Steve Sistare wrote: > >>>> Buffered monitor output is lost when abort() is called. The pattern > >>>> error_report() followed by abort() occurs about 60 times, so valuable > >>>> information is being lost when the abort is called in the context of a > >>>> monitor command. > >>> > >>> I'm curious, was there a particular abort() scenario that you hit ? > >> > >> Yes, while tweaking the suspended state, and forgetting to add transitions: > >> > >> error_report("invalid runstate transition: '%s' -> '%s'", > >> abort(); > >> > >> But I have previously hit this for other errors. > >> > >>> For some crude statistics: > >>> > >>> $ for i in abort return exit goto ; do echo -n "$i: " ; git grep > >>> --after 1 error_report | grep $i | wc -l ; done > >>> abort: 47 > >>> return: 512 > >>> exit: 458 > >>> goto: 177 > >>> > >>> to me those numbers say that calling "abort()" after error_report > >>> should be considered a bug, and we can blanket replace all the > >>> abort() calls with exit(EXIT_FAILURE), and thus avoid the need to > >>> special case flushing the monitor. > >> > >> And presumably add an atexit handler to flush the monitor ala > >> monitor_abort. > >> AFAICT currently no destructor is called for the monitor at exit time. > > > > The HMP monitor flushes at each newline, and exit() will take care of > > flushing stdout, so I don't think there's anything else needed. > > > >>> Also I think there's a decent case to be made for error_report() > >>> to call monitor_flush(). > >> > >> A good start, but that would not help for monitors with skip_flush=true, > >> which > >> need to format the buffered string in a json response, which is the case I > >> tripped over. > > > > 'skip_flush' is only set to 'true' when using a QMP monitor and invoking > > "hmp-monitor-command". > > OK, that is narrower than I thought. Now I see that other QMP monitors send > error_report() to stderr, hence it is visible after abort and exit: > > int error_vprintf(const char *fmt, va_list ap) { > if (cur_mon && !monitor_cur_is_qmp()) > return monitor_vprintf(cur_mon, fmt, ap); > return vfprintf(stderr, fmt, ap); <-- HERE > > That surprises me, I thought that would be returned to the monitor caller in > the > json response. I guess the rationale is that the "main" error, if any, will be > set and returned by the err object that is passed down the stack during > command > evaluation. > > > In such a case, the error message needs to be built into a JSON error > > reply and sent over the socket. Your patch doesn't help this case > > since you've just printed to stderr. > > Same as vfprintf above! > > > I don't think it is reasonable > > to expect QMP monitors to send replies on SIG_ABRT anyway. > > I agree. My patch causes the error to be seen somewhere, anywhere, instead > of being dropped on the floor. > > > So I don't > > think the skip_flush=true scenario is a problem to be concerned with. > > It is indeed a narrow case, and not worth much effort or code change. > I'm inclined to drop it, but I appreciate the time you have spent reviewing > it.
If you know of scenarios that can trigger abort() as a result of either QMP or HMP commands, then we still need to fix those. Nothing that is reachable from the monitor commands should ever end up in abort(), as a general rule. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|