Re: [Qemu-devel] [PATCH 00/10]: QError v4

Anthony Liguori Wed, 18 Nov 2009 10:09:23 -0800

Markus Armbruster wrote:

1. QError feels overengineered, particularly for a first iteration of a
   protocol.


   We go to great lengths to build highly structured error objects.
   There is only one sane reason for doing that: a demonstrated client
   need.  I can't see that need.

A GUI wants to present the user a mechanism to reduce the memoryconsumption of a VM. There are three reasons why this would not work.The first is that ballooning was not configured for this guest. Thesecond is that ballooning was configured but the version of the KVMkernel module does not have a sync mmu. The third is that anappropriate guest driver has not been loaded for the device.

If we don't specify a type of error, all the GUI can do is fail rightbefore trying to balloon the guest. That offers the user no insight whyit failed other than popping up a message box with an error message(that may not even be in their locale). For a non-english speaker, it'sgibberish.

If you use three types of errors, then a GUI can differentiate thesecases and presents the user different feedback. For the first case, itcan simply gray out the menu item. For the second can, it can also grayout the menu item but it can provide a one time indication (in say, astatus bar), that ballooning is unavailable because the kernel versiondoes not support it. For the third error, you would want to prompt theuser to indicate the driver is not loaded in the guest.

It's highly likely that for this last case, you'd want generic code togenerate this error. Further more, in order to generate the errormessage for a user, you need to know what device does not have afunctioning driver. You may say it's obvious for something like infoballoon but it's not so obvious if you assume we support things otherthan just virtio-pci-balloon. For instance, with s390, it's a differentballoon driver. For xenner/xenpv, it's a different driver. It reallysuggests we should send the qdev device name for the current balloon driver.

2. It falls short of the requirement that clients can reasonably
   classify errors they don't know.

I think this is wishful thinking. I strongly suspect that you can'tfind a reasonably popular web browser that doesn't know all of the errorcodes that web servers generate.


We should document all of the errors that each QMP function can return.

That said, if you thought there was 4-5 common classes of errors, addingan additional 'base_class' field could be a completely reasonable thingto do. As long as you're adding information vs. taking it away, I'mquite happy.

3. It falls short of the requirement that clients can easily present a
   human-readable error description to their human users, regardless of
   whether they know the error or not.

That's just incorrect. We provide an example conversion table that'sakin to strerror() or a __repr__ for an exception in Python.

In more detail[*]:

There are more than one way to screw up here.  One is certainly to fall
short of client requirements in a way that is costly to fix (think
incompatible protocol revision).  Another is to overshoot them in a way
that is costly to maintain.  A third one is to spend too much time on
figuring out the perfect solution.

I believe our true problem is that we're still confused and/or
disagreeing on client requirements, and this has led to a design that
tries to cover all the conceivable bases, and feels overengineered to
me.

I've described my requirements for what a client can do. I'd like tounderstand how you disagree.

There are only so many complexity credits you can spend in a program,
both globally and individually.  I'm very, very wary of making error
reporting more complex than absolutely, desperately necessary.

We're following a very well established model of error reporting(object-based exceptions). This is well established and well understood.

Reporting errors should remain as easy as we can make it.  The more
cumbersome it is to report an error, the less it is done, and the more
vaguely it is done.  If you have to edit more than the error site to
report an error accurately, then chances skyrocket that it won't be

reported, or it'll be reported inaccurately.

I don't buy these sort of arguments. We are not designing a librarythat is open to abuse. If our developers are fall victim to this, thenwe need to retrain them to be less lazy by not accepting patches that doa poor job reporting errors.

 And not because coders are
stupid or lazy, but because they make sensible use of their very limited
complexity credits: if you can either get the job done with lousy error
messages, or not get it done at all, guess what the smart choice is.

Error reporting is extremely important in a management scenario. Youare severely limited by what actions you can take based on the amount oferror information returned. This is particularly when dealing with amulti-client management mechanism. IMHO, this is a sensible place tospend complexity credits.

If I understand Dan correctly, machine-readable error code +
human-readable description is just fine, as long as the error code is
reasonably specific and the description is precise and complete.  Have
we heard anything else from client developers?

There are no client developers because QMP doesn't exist. Historically,we haven't provided high quality error messages so of course libvirtmakes due without them today.

But I can answer on behalf of the management work we've done based onlibvirt.

If you attempt to do a virDomainSetMemory() to a qemu domain, you onlyget an error if you're using a version of qemu that doesn't support theballoon command. You do not get an error at all for the case where thekernel module doesn't support ballooning or a guest driver isn't loaded.

The concrete use case here is building an autoballoon mechanism part ofa larger management suite. If you're in an environment where memoryovercommit is important, then you really do want to know whetherballooning isn't functioning because appropriate guest drivers aren'tloaded.

I'm not a client developer, but let me make a few propositions on client
needs anyway:

* Clients are just as complexity-bound as the server.  They prefer their
  errors as simple as possible, but no simpler.

I agree but the problem is that every client has a different definitionof "simple as possible". My client doesn't necessarily care about PCIhotplug error messages because it's not a function that we use.However, ballooning is tremendously important and I want to knoweverything I can about why it failed.

* The crucial question for the client isn't "what exactly went wrong".
  It is "how should I handle this error".  Answering that question
  should be easy (say, check the error code).  Figuring out what went
  wrong should still be possible for a human operator of the client.

I disagree. The client needs to know what went wrong if they are goingto take a corrective action.

* Clients don't want to be tightly coupled to the server.


I don't follow this.

* No matter how smart and up-to-date the client is, there will always be
  errors it doesn't know.  And it needs to answer the "how to handle"
  question whether it knows the error code or not!  That's why protocols
  like HTTP have simple rules to classify error codes.

HTTP is the exception and the HTTP error classes really leave an awfullot to be desired. If you feel it's important to add error classes, I'dbe happy to consider those patches.

  Likewise, it needs to be able to give a human operator enough
  information to figure out what went wrong whether it knows the error
  or not.  How do you expect clients to format a structured error object
  for an error they don't know into human-readable text?  Isn't it much
  easier and more robust to cut out the formatting middle-man and send
  the text along with the error?

We do that by providing a conversion table. The fundamental problemhere is localization.

* Clients need to present a human-readable error description to their
  human users, regardless of whether they know the error or not.

I would never pass an error string from a third party directly to auser. I doubt you'll find a lot of good applications that do. From ausability perspective, you need to be in control of your interactionswith the user. They grammar needs to be consistent and it has to belocalized. The best you would do with a third party string is log it tosome log file for later examination by support. In that case, dumpingthe class code and the supporting information in JSON form is just asgood as a text description.

There's a general rule of programming that I've found quite hard to
learn and quite painful to disobey: always try the stupidest solution
that could possibly work first.

Based on what I've learned about client requirements so far, I figure
that solution is "easily classified error code + human-readable
description".

I appreciate you feel strongly that this is the right thing to do. Thatsaid, I disagree and I have specific use cases in mind that are notsatisfied by the above mechanism. In a situation like this, I tend tothink the best thing to do is provide more information such thateveryone can get the function they want.

If we add a base_class member to the error structure and we have thetable for formatting error messages, then that should satisfy yourrequirements. If you're concerned about a client having to have anup-to-date version of the table, we can introduce a monitor command topretty print a json error object which would address that concern.

Then I think your argument boils down to, you think the remaininginfrastructure is unnecessary for what you anticipate the uses to be.But hopefully, you'll at least concede that there are valid use casesbeyond what you anticipate to be the normal case that do need thisinformation.

If we later realize that this solution was *too* stupid, we can simply
add a data member to the error object.

It's never that easy because a management tool has to support a leastcommon denominator.


--
Regards,

Anthony Liguori

Re: [Qemu-devel] [PATCH 00/10]: QError v4

Reply via email to