Re: [Qemu-devel] Error propagation in generated visitors and command marshallers

Markus Armbruster Mon, 14 Apr 2014 01:00:07 -0700

Peter Crosthwaite <peter.crosthwa...@xilinx.com> writes:

> On Fri, Apr 11, 2014 at 11:41 PM, Markus Armbruster <arm...@redhat.com> wrote:
>> Peter Crosthwaite <peter.crosthwa...@xilinx.com> writes:
>>
>>> On Thu, Apr 10, 2014 at 1:48 AM, Markus Armbruster <arm...@redhat.com> 
>>> wrote:
>>>> I stumbled over this while trying to purge error_is_set() from the code.
>>>>
>>>>
>>>> Here's how we commonly use the Error API:
>>>>
>>>>     Error *err = NULL;
>>>>
>>>>     foo(arg, &err)
>>>>     if (err) {
>>>>         goto out;
>>>>     }
>>>>     bar(arg, &err)
>>>>     if (err) {
>>>>         goto out;
>>>>     }
>>>>
>>>> This ensures that err is null on entry, both for foo() and for bar().
>>>> Many functions rely on that, like this:
>>>>
>>>>     void foo(ArgType arg, Error **errp)
>>>>     {
>>>>         if (frobnicate(arg) < 0) {
>>>>             error_setg(errp, "Can't frobnicate");
>>>>                                 // This asserts errp != NULL
>>>>         }
>>>>     }
>>>>
>>>>
>>>> Here's how some of our visitor code uses the Error API (for real code,
>>>> check out generated qmp-marshal.c):
>>>>
>>>>     Error *err = NULL;
>>>>     QmpInputVisitor *mi = qmp_input_visitor_new_strict(QOBJECT(args));
>>>>     Visitor *v = qmp_input_get_visitor(mi);
>>>>     char *foo = NULL;
>>>>     char *bar = NULL;
>>>>
>>>>     visit_type_str(v, &foo, "foo", &err);
>>>>     visit_type_str(v, &bar, "bar", &err);
>>>>     if (err) {
>>>>         goto out;
>>>>     }
>>>>
>>>> Unlike above, this may pass a non-null errp to the second
>>>> visit_type_str(), namely when the first one fails.
>>>>
>>>> The visitor functions guard against that, like this:
>>>>
>>>>     void visit_type_str(Visitor *v, char **obj, const char *name, Error 
>>>> **errp)
>>>>     {
>>>>         if (!error_is_set(errp)) {
>>>>             v->type_str(v, obj, name, errp);
>>>>         }
>>>>     }
>>>>
>>>> As discussed before, error_is_set() is almost almost wrong, fragile or
>>>> unclean.  What if errp is null?  Then we fail to stop visiting after an
>>>> error.
>>>
>>> That should be the callers problem. If you pass a NULL errp then the
>>> intended semantic is to ignore errors.
>>
>> The *caller* isn't interested in an error.  But the callee's behavior
>> should *not* be affected by that at all other than not returning an
>> error.
>>
>> In particular, the callee should never continue after an error just
>> because the caller isn't interested in detailed error information.
>>
>
> But the error is from a previous call not the current call. Callers
> job to inform second call that first one failed (or in current status
> quo - not call the second call at all). But its caller job to know the
> dependancy otherwise calls are not self contained.
>
>> That's why "if (error_is_set(errp)) bail" and similar are always wrong
>> when errp is a parameter that may be null.
>>
>
> Agreed. Don't see the problem here though - it's bad caller code too.
>
>>>> The function could be improved like this:
>>>>
>>>>     void visit_type_str(Visitor *v, char **obj, const char *name, Error 
>>>> **errp)
>>>>     {
>>>>         assert(errp);
>>>
>>> And this is irregular in that you are now mandating the Error ** and
>>> thus removing the capability to ignore errors.
>>
>> It is irregular.  The irregularity is necessary as long as the function
>> isn't prepared for a null errp.
>>
>
> My understanding is all functions should be prepared for NULL errp.


If you combine "do nothing when called with an error set" semantics with
a "pass null to ignore errors" convention, you set yourself up for
accidental misuse.  Consider

    visit_type_str(v, &foo, "foo", errp);
    visit_type_str(v, &bar, "bar", errp);

where visit_type_str() does nothing when error_is_set(errp).  This
*breaks* when errp is null, but seeing that takes a non-local argument.

Here's a simple safety rule: if you do anything with an Error * but pass
it on, you must ensure it's not null, and you should make its
non-null-ness locally obvious whenever practical.

This rule should ensure that you can pass null to ignore errors without
changing behavior beyond ignoring the error.

>>>>         if (!*errp) {
>>>>             v->type_str(v, obj, name, errp);
>>>>         }
>>>>     }
>>>>
>>>>
>>>> But: is it a good idea to have both patterns in the code?  Should we
>>>> perhaps use the common pattern for visiting, too?  Like this:
>>>>
>>>>     visit_type_str(v, &foo, "foo", &err);
>>>>     if (err) {
>>>>         goto out;
>>>>     }
>>>>     visit_type_str(v, &bar, "bar", &err);
>>>>     if (err) {
>>>>         goto out;
>>>>     }
>>>>
>>>> Then we can assume *errp is clear on function entry, like this:
>>>>
>>>>     void visit_type_str(Visitor *v, char **obj, const char *name, Error 
>>>> **errp)
>>>>     {
>>>>         v->type_str(v, obj, name, errp);
>>>>     }
>>>>
>>>> Should execute roughly the same number of conditional branches.
>>>>
>>>> Tedious repetition of "if (err) goto out" in the caller, but that's what
>>>> we do elsewhere, and unlike elsewhere, these one's are generated.
>>>>
>>>> Opinions?
>>>
>>> I think this code as-is is a good example of what we should do
>>> elsewhere. The code base has bloated with the if (error) { bail; } on
>>> every Error ** accepting API call. I proposed a while back a semantic
>>> that Error ** Accepting APIs perform no action when the error is
>>> already set to allow for long sequences of calls to run without the
>>> constant checks. You then report the first error in a catchall at the
>>> end of the run.
>>>
>>> I think this particular code is probably good, provided your case of
>>> NULL errp is enforced against by the caller.
>>
>> My point isn't that this technique is bad, only that it's different from
>> what we do everywhere else, and the two techniques do not combine well.
>>
>> Here's how we handle errors everywhere else:
>>
>>     void frob([...], Error **errp)
>>     {
>>         Error *err = NULL;
>>
>>         foo(&err)
>>         if (err) {
>>             goto out;
>>         }
>>         bar(&err)
>>         if (err) {
>>             goto out;
>>         }
>>         [...]
>>     out:
>>         error_propagate(errp, err);
>>         [...]
>>     }
>>
>> Both foo() and bar() are never entered with an error set.  Consequently,
>> they don't check for that case.
>>
>> If you screw up and call them with an error set, they'll die on the
>> first error of their own, because error_set() asserts the error is
>> clear.
>>
>> You might be tempted to elide err, like this:
>>
>>         foo(errp)
>>         if (error_is_set(errp)) {
>>             goto out;
>>         }
>>         bar(errp)
>>         if (error_is_set(errp)) {
>>             goto out;
>>         }
>>         [...]
>>    out:
>>         [...]
>>
>> But that's *wrong*, because it executes bar() after foo() failed when
>> errp is null.  Ignoring errors from frob() also changes what frob()
>> does!  Not an acceptable interface.
>>
>> You can elide err only in cases where all you ever do with it is pass it
>> on unexamined.
>>
>
> Yeh so you must define an Error * locally in cases where you want the
> first call failure to inhibit the second. I think this is reasonable
> when its the callers wish (frob()) to behave so.

With error_is_set() purged, mistakes will hopefully be fairly obvious,
since "if (*errp)" begs the question "what if !errp?" much more than
error_is_set(errp) does, and makes mistakes crash hard instead of
sweeping them under the carpet.

>> An alternative technique is to partly move error checks into the
>> callees, like this:
>>
>>         err = NULL;
>>         foo(&err)
>>         bar(&err)
>>         if (err) {
>>             goto out;
>>         }
>>         [...]
>>     out:
>>         error_propagate(errp, err);
>>         [...]
>>
>> Now bar() must not do anything when called with an error set.  It needs
>> to begin with code like this:
>>
>>         if (error_is_set(errp) {
>>             return;
>>         }
>>
>
> The other compormise is to only bail the second function in its own
> error case. Let it run as normal and when it encounters an already set
> error, ignore it. This is useful when you have lots of independant
> calls one after the other . For an extreme example, check the
> ppc/e500.c device tree API calls which I want to convert to Error API
> but having to "if (err) { bail };" every one of them is not going to
> fly.

This is combining errors from independent calls.

The current interfaces are designed for a more common pattern: first
error in a chain of dependent calls must be handled.

If we actually have a need for combining errors, and the current
interfaces turn out to be too awkward for that, then let's extend them,
but without compromising their usability for the common pattern.

>> Trades bloating fewer call sites against bloating all the functions.
>> Tradeoff.
>>
>
> In many API cases, number of callers greatly outnumbers number of functions.

Yes, but you'd still have to check for errors at many call sites.  To
see how many, we'd need a diff.

>> Note that adding the check above to bar() makes bar() less suited to the
>> technique I described first.  Remember, with that technique, bar() must
>> not be called with an error set, and we rely on an assertion to protect
>> us against mistakes.  The check above completely bypasses the assertion.
>>
>> Moreover, having both techniques in the tree is bound to lead to
>> confusion.  Do I have to check after my function call?  Do I have to
>> check before doing anything in my function?  Sounds like a surefire
>> recipe for error handling botches to me.
>>
>> This is what I mean by "the two techniques do not combine well".  Let's
>> pick one and stick to it.
>>
>> The first technique is overwhelmingly prevalent now.  The only holdout
>> of the second technique is some QAPI-related code.
>>
>> I'm going to try converting that to the first technique.  If you can
>> come up with patches converting everything else to the second technique,
>> we can discuss which one is better :)
>>
>
> Yeh one day. Will be an awesome -ve diffstat if it happens though.

Maybe, maybe not (in more than one way) :)

Re: [Qemu-devel] Error propagation in generated visitors and command marshallers

Reply via email to