On Wed, Nov 15, 2023 at 10:30:29AM -0500, Steven Sistare wrote:
> 
> 
> On 11/6/2023 5:10 AM, Daniel P. Berrangé wrote:
> > On Fri, Nov 03, 2023 at 03:51:00PM -0400, Steven Sistare wrote:
> >> On 11/3/2023 1:33 PM, Daniel P. Berrangé wrote:
> >>> On Fri, Nov 03, 2023 at 09:01:29AM -0700, Steve Sistare wrote:
> >>>> Buffered monitor output is lost when abort() is called.  The pattern
> >>>> error_report() followed by abort() occurs about 60 times, so valuable
> >>>> information is being lost when the abort is called in the context of a
> >>>> monitor command.
> >>>
> >>> I'm curious, was there a particular abort() scenario that you hit ?
> >>
> >> Yes, while tweaking the suspended state, and forgetting to add transitions:
> >>
> >>         error_report("invalid runstate transition: '%s' -> '%s'",
> >>         abort();
> >>
> >> But I have previously hit this for other errors.
> >>
> >>> For some crude statistics:
> >>>
> >>>   $ for i in abort return exit goto ; do echo -n "$i: " ; git grep 
> >>> --after 1 error_report | grep $i | wc -l ; done
> >>>   abort: 47
> >>>   return: 512
> >>>   exit: 458
> >>>   goto: 177
> >>>
> >>> to me those numbers say that calling "abort()" after error_report
> >>> should be considered a bug, and we can blanket replace all the
> >>> abort() calls with exit(EXIT_FAILURE), and thus avoid the need to
> >>> special case flushing the monitor.
> >>
> >> And presumably add an atexit handler to flush the monitor ala 
> >> monitor_abort.
> >> AFAICT currently no destructor is called for the monitor at exit time.
> > 
> > The HMP monitor flushes at each newline,  and exit() will take care of
> > flushing stdout, so I don't think there's anything else needed.
> > 
> >>> Also I think there's a decent case to be made for error_report()
> >>> to call monitor_flush().
> >>
> >> A good start, but that would not help for monitors with skip_flush=true, 
> >> which 
> >> need to format the buffered string in a json response, which is the case I 
> >> tripped over.
> > 
> > 'skip_flush' is only set to 'true' when using a QMP monitor and invoking
> > "hmp-monitor-command".
> 
> OK, that is narrower than I thought.  Now I see that other QMP monitors send 
> error_report() to stderr, hence it is visible after abort and exit:
> 
> int error_vprintf(const char *fmt, va_list ap) {
>     if (cur_mon && !monitor_cur_is_qmp())
>         return monitor_vprintf(cur_mon, fmt, ap);
>     return vfprintf(stderr, fmt, ap);                <-- HERE
> 
> That surprises me, I thought that would be returned to the monitor caller in 
> the
> json response. I guess the rationale is that the "main" error, if any, will be
> set and returned by the err object that is passed down the stack during 
> command
> evaluation.
> 
> > In such a case, the error message needs to be built into a JSON error
> > reply and sent over the socket. Your patch doesn't help this case
> > since you've just printed to stderr.  
> 
> Same as vfprintf above!
> 
> > I don't think it is reasonable
> > to expect QMP monitors to send replies on SIG_ABRT anyway. 
> 
> I agree.  My patch causes the error to be seen somewhere, anywhere, instead
> of being dropped on the floor.
> 
> > So I don't
> > think the skip_flush=true scenario is a problem to be concerned with.
> 
> It is indeed a narrow case, and not worth much effort or code change.
> I'm inclined to drop it, but I appreciate the time you have spent reviewing 
> it.

If you know of scenarios that can trigger abort() as a result of
either QMP or HMP commands, then we still need to fix those. Nothing
that is reachable from the monitor commands should ever end up in
abort(), as a general rule.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Reply via email to