* Markus Armbruster (arm...@redhat.com) wrote: > "Daniel P. Berrange" <berra...@redhat.com> writes: > > > On Thu, Oct 20, 2016 at 12:42:01PM +0200, Markus Armbruster wrote: > >> "Dr. David Alan Gilbert" <dgilb...@redhat.com> writes: > >> > >> > * Daniel P. Berrange (berra...@redhat.com) wrote: > >> >> On Thu, Oct 20, 2016 at 10:55:52AM +0200, Markus Armbruster wrote: > >> >> > >> >> Our code has increasingly converted to propagate errors up the call > >> >> chain, but having a mix of different error reporting approaches > >> >> is increasingly causing pain. > >> >> > >> >> eg a function which propagates errors wants to call into a function > >> >> whicih uses error_report > >> > >> When you add an Error * parameter to a function, you get to convert > >> everything it calls. Failure to do so is a bug, simple as that. > >> > >> Likewise, when you add a new call to a function that takes an Error * > >> parameter. > > > > I agree its a bug - just that from a pragmatic POV it isn't always > > practical to convert the entire call chain in one big bang - particularly > > if you are working on shared infrastructure. > > Been there, felt the pain. I just wish we'd be more diligent adding > FIXME comments when we take shortcuts. > > > For example, the > > qemu-sockets.c APIs for connecting & listening on unix/tcp sockets > > use Error * to report errors. Those APIs were used in the block > > layer, migration, vnc, chardev and network backend code. It clearly > > isn't practical to convert all those regions to code to correctly > > propagate Error ** in one go. That's certainly the desired end > > goal IMHO, but getting there is going to take a while. > > > > This slow conversion has been going on for a long time now, so > > maybe we need to make an aggressive push in strategic areas of > > the code to stop it dragging out indefinitely. This is kind of > > a general problem we have in QEMU - we have introduced lots of > > new standard frameworks, but very rarely finish converting > > code to use them. > > Yep. > > > This is why I did a big push to try to get > > everything switched over to use QIOChannel, in the shortest > > possible time, rather than doing only small bits over many > > releases. > > Much better when you can pull it off. > > A common road block is having to update code nobody wants to touch, let > alone maintain, with no way to test. The easy way out is to update just > the code you actually care about, then call the conversion process > "incremental". Well, it's not incremental without commitments to > increment.
I don't think we should do this until we can come up with a scheme that has less impact on the code being modified. > >> >> - there's no nice way to propagate the error > >> >> since it has already been reported. If the function then wants to > >> >> explicitly ignore the error, then that's impossible too,since it has > >> >> already been reported. Add in our code which doesn't use error_report > >> >> and instead returns errno values, such as the block layer, and it gets > >> >> even worse because if that calls a function which propagates an error, > >> >> it has to throw away that useful error and return a useless invented > >> >> errno value :( > >> > >> Different bug: when you receive an Error, you have to either handle and > >> consume it, or pass it on. Throwing it away and returning an error code > >> instead counts as neither, and is a bug. > > > > Totally agree its a bug - just again hits the reality of having to > > do major code conversions. > > > >> >> IMHO continuing to convert code to propagate errors is the only way > >> >> out of this swamp, because it provides the greatest flexibility to > >> >> the callers of said functions to decide how to deal with the error. > >> > >> I wouldn't call the whole situation a swamp. error_report() is just > >> fine in many places. So is returning -errno. We convert to Error when > >> these techniques cease to be fine. The swampy part is old code where > >> these techniques have never been fine, or at least not for a while. To > >> be drained incrementally. > > > > Realistically all the major backend subsystems (chardev, network, block, > > ui and migration) need to be converted to Error ** propagation, since > > they all ultimately call into some common code that reports Error **. > > Infrastucture generally doesn't know how it's used, which means > error_report() is generally wrong there. Sufficiently simple functions > can keep returning -errno, null, whatever, but the interesting stuff > needs to use Error. > > Very few places will end up being able to stick with -errno, or plain > > error_report in the long term. > > Not sure about "very few". Less than now. We'll see. I'd also prefer we got the very-few level; Migration used to be characterised by getting a 'load of migration failed -22' and having no clue in the logs to why; I've slowly fought back to be able to get an error from the lowest level that caused the failure. I want more of that, so that when someone gets a rare failure in the field I can see why. Dave > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK