Re: [Qemu-devel] [PATCH v0 0/9] QError

Markus Armbruster Mon, 19 Oct 2009 06:14:23 -0700

I just caught up with this topic, and since there's no obvious one
message to reply to, I just reply here.


I agree with Anthony we better get the error reporting protocol
approximately right the first time, and rushing it could lead to tears
later on.  Mind, I said "approximately right", not "perfect".

However, there are more than one way to screw this up.  One is certainly
to fall short of client requirements in a way that is costly to fix
(think incompatible protocol revision).  Another is to overshoot them in
a way that is costly to maintain.  A third one is to spend too much time
on figuring out the perfect solution.

I believe our true problem is that we're still confused and/or
disagreeing on client requirements, and this has led to a design that
tries to cover all the conceivable bases, and feels overengineered to
me.

There are only so many complexity credits you can spend in a program,
both globally and individually.  I'm very, very wary of making error
reporting more complex than absolutely, desperately necessary.
Reporting errors should remain as easy as we can make it, for reasons
that have already been mentioned by me and others:

* The more cumbersome it is to report an error, the less it is done, and
  the more vaguely it is done.  If you have to edit more than the error
  site to report an error accurately, then chances skyrocket that it
  won't be reported, or it'll be reported inaccurately.  And not because
  coders are stupid or lazy, but because they make sensible use of their
  very limited complexity credits: if you can either get the job done
  with lousy error messages, or not get it done at all, guess what the
  smart choice is.

* It's highly desirable for errors to do double duty as documentation.
  Code like this (random pick) doesn't need a comment:

        if (qemu_opt_get(opts, "vlan")) {
            qemu_error("The 'vlan' parameter is not valid with -netdev\n");
            return -1;
        }

  Bury the human-readable description in some far-away error list, and
  we lose that.  Even wrapping it in enough error object construction
  boiler-plate can easily lose it.

So, before we accept the cost of highly structured error objects, I'd
like to see a convincing argument for their need.  And actual client
developers like Dan are in a much better position to make such an
argument than server developers (like me) speculating about client
needs.

If I understand Dan correctly, machine-readable error code +
human-readable description is just fine, as long as the error code is
reasonably specific and the description is precise and complete.  Have
we heard anything else from client developers?

I'm not a client developer, but let me make a few propositions on client
needs anyway:

* Clients are just as complexity-bound as the server.  They prefer their
  errors as simple as possible, but no simpler.

* The crucial question for the client isn't "what exactly went wrong".
  It is "how should I handle this error".  Answering that question
  should be easy (say, check the error code).  Figuring out what went
  wrong should still be possible for a human operator of the client.

* Clients don't want to be tightly coupled to the server.

* No matter how smart and up-to-date the client is, there will always be
  errors it doesn't know.  And it needs to answer the "how to handle"
  question whether it knows the error code or not!  That's why protocols
  like HTTP have simple rules to classify error codes.

  Likewise, it needs to be able to give a human operator enough
  information to figure out what went wrong whether it knows the error
  or not.  How do you expect clients to format a structured error object
  for an error they don't know into human-readable text?  Isn't it much
  easier and more robust to cut out the formatting middle-man and send
  the text along with the error?

There's a general rule of programming that I've found quite hard to
learn and quite painful to disobey: always try the stupidest solution
that could possibly work first.

Based on what I've learned about client requirements so far, I figure
that solution is "easily classified error code + human-readable
description".

What if we go with that now, and later realize that it was *too* stupid,
i.e. we need more structure after all?  I believe that'll be only
disastrous if we need an *incompatible* protocol revision to accomodate
it.  Can't we simply add a "data" member to the error object then?

Re: [Qemu-devel] [PATCH v0 0/9] QError

Reply via email to