Re: [Qemu-devel] chardev's and fd's in monitors

Dr. David Alan Gilbert Thu, 20 Oct 2016 11:11:09 -0700

* Markus Armbruster (arm...@redhat.com) wrote:
> "Daniel P. Berrange" <berra...@redhat.com> writes:
> 
> > On Thu, Oct 20, 2016 at 12:42:01PM +0200, Markus Armbruster wrote:
> >> "Dr. David Alan Gilbert" <dgilb...@redhat.com> writes:
> >> 
> >> > * Daniel P. Berrange (berra...@redhat.com) wrote:
> >> >> On Thu, Oct 20, 2016 at 10:55:52AM +0200, Markus Armbruster wrote:
> >> >> 
> >> >> Our code has increasingly converted to propagate errors up the call
> >> >> chain, but having a mix of different error reporting approaches
> >> >> is increasingly causing pain.
> >> >> 
> >> >> eg a function which propagates errors wants to call into a function
> >> >> whicih uses error_report
> >> 
> >> When you add an Error * parameter to a function, you get to convert
> >> everything it calls.  Failure to do so is a bug, simple as that.
> >> 
> >> Likewise, when you add a new call to a function that takes an Error *
> >> parameter.
> >
> > I agree its a bug - just that from a pragmatic POV it isn't always
> > practical to convert the entire call chain in one big bang - particularly
> > if you are working on shared infrastructure.
> 
> Been there, felt the pain.  I just wish we'd be more diligent adding
> FIXME comments when we take shortcuts.
> 
> >                                              For example, the
> > qemu-sockets.c APIs for connecting & listening on unix/tcp sockets
> > use Error * to report errors. Those APIs were used in the block
> > layer, migration, vnc, chardev and network backend code. It clearly
> > isn't practical to convert all those regions to code to correctly
> > propagate Error ** in one go. That's certainly the desired end
> > goal IMHO, but getting there is going to take a while.
> >
> > This slow conversion has been going on for a long time now, so
> > maybe we need to make an aggressive push in strategic areas of
> > the code to stop it dragging out indefinitely. This is kind of
> > a general problem we have in QEMU - we have introduced lots of
> > new standard frameworks, but very rarely finish converting
> > code to use them.
> 
> Yep.
> 
> >                    This is why I did a big push to try to get
> > everything switched over to use QIOChannel, in the shortest
> > possible time, rather than doing only small bits over many
> > releases.
> 
> Much better when you can pull it off.
> 
> A common road block is having to update code nobody wants to touch, let
> alone maintain, with no way to test.  The easy way out is to update just
> the code you actually care about, then call the conversion process
> "incremental".  Well, it's not incremental without commitments to
> increment.


I don't think we should do this until we can come up with a scheme
that has less impact on the code being modified.

> >> >>                          - there's no nice way to propagate the error
> >> >> since it has already been reported.  If the function then wants to
> >> >> explicitly ignore the error, then that's impossible too,since it has
> >> >> already been reported.  Add in our code which doesn't use error_report
> >> >> and instead returns errno values, such as the block layer, and it gets
> >> >> even worse because if that calls a function which propagates an error,
> >> >> it has to throw away that useful error and return a useless invented
> >> >> errno value :(
> >> 
> >> Different bug: when you receive an Error, you have to either handle and
> >> consume it, or pass it on.  Throwing it away and returning an error code
> >> instead counts as neither, and is a bug.
> >
> > Totally agree its a bug - just again hits the reality of having to
> > do major code conversions.
> >
> >> >> IMHO continuing to convert code to propagate errors is the only way
> >> >> out of this swamp, because it provides the greatest flexibility to
> >> >> the callers of said functions to decide how to deal with the error.
> >> 
> >> I wouldn't call the whole situation a swamp.  error_report() is just
> >> fine in many places.  So is returning -errno.  We convert to Error when
> >> these techniques cease to be fine.  The swampy part is old code where
> >> these techniques have never been fine, or at least not for a while.  To
> >> be drained incrementally.
> >
> > Realistically all the major backend subsystems (chardev, network, block,
> > ui and migration) need to be converted to Error ** propagation, since
> > they all ultimately call into some common code that reports Error **.
> 
> Infrastucture generally doesn't know how it's used, which means
> error_report() is generally wrong there.  Sufficiently simple functions
> can keep returning -errno, null, whatever, but the interesting stuff
> needs to use Error.
> > Very few places will end up being able to stick with -errno, or plain
> > error_report in the long term.
> 
> Not sure about "very few".  Less than now.  We'll see.

I'd also prefer we got the very-few level; Migration used to be
characterised by getting a 'load of migration failed -22' and having
no clue in the logs to why; I've slowly fought back to be able
to get an error from the lowest level that caused the failure.
I want more of that, so that when someone gets a rare failure in the field
I can see why.

Dave

> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] chardev's and fd's in monitors

Reply via email to