Re: [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error

Markus Armbruster Thu, 29 Sep 2016 06:14:26 -0700

Alex Williamson <alex.william...@redhat.com> writes:

> On Tue, 13 Sep 2016 08:16:20 +0200
> Markus Armbruster <arm...@redhat.com> wrote:
>
>> Cc: Alex for device assignment expertise.
>> 
>> Cao jin <caoj.f...@cn.fujitsu.com> writes:
>> 
>> > On 09/12/2016 09:29 PM, Markus Armbruster wrote:  
>> >> Cao jin <caoj.f...@cn.fujitsu.com> writes:
>> >>  
>> >>> The input parameters is used for creating the msix capable device, so
>> >>> they must obey the PCI spec, or else, it should be programming error.  
>> >>
>> >> True when the the parameters come from a device model attempting to
>> >> define a PCI device violating the spec.  But what if the parameters come
>> >> from an actual PCI device violating the spec, via device assignment?  
>> >
>> > Before the patch, on invalid param, the vfio behaviour is:
>> >   error_report("vfio: msix_init failed");
>> >   then, device create fail.
>> >
>> > After the patch, its behaviour is:
>> >   asserted.
>> >
>> > Do you mean we should still report some useful info to user on invalid
>> > params?  
>> 
>> In the normal case, asking msix_init() to create MSI-X that are out of
>> spec is a programming error: the code that does it is broken and needs
>> fixing.
>> 
>> Device assignment might be the exception: there, the parameters for
>> msix_init() come from the assigned device, not the program.  If they
>> violate the spec, the device is broken.  This wouldn't be a programming
>> error.  Alex, can this happen?
>> 
>> If yes, we may want to handle it by failing device assignment.
>
>
> Generally, I think the entire premise of these sorts of patches is
> flawed.  We take a working error path that allows a driver to robustly
> abort on unexpected date and turn it into a time bomb.  Often the
> excuse for this is that "error handling is hard".  Tough.  Now a
> hot-add of a device that triggers this changes from a simple failure to
> a denial of service event.  Furthermore, we base that time bomb on our
> interpretation of the spec, which we can only validate against in-tree
> devices.
>
> We have actually had assigned devices that fail the sanity test here,
> there's a quirk in vfio_msix_early_setup() for a Chelsio device with
> this bug.  Do we really want user experiencing aborts when a simple
> device initialization failure is sufficient?
>
> Generally abort code paths like this cause me to do my own sanity
> testing, which is really poor practice since we should have that sanity
> testing in the common code.  Thanks,


I prefer to assert on programming error, because 1. it does double duty
as documentation, 2. error handling of impossible conditions is commonly
wrong, and 3. assertion failures have a much better chance to get the
program fixed.  Even when presence of a working error path kills 2., the
other two make me stick to assertions.

However, input out-of-spec is not a programming error.  For most users
of msix_init(), the arguments are hard-coded, thus invalid arguments are
a programming error.  For device assignment, they come from a physical
device, thus invalid arguments can either be a programming error (our
idea of "invalid" is invalid) or bad input (the physical device is
out-of-spec).  Since we can't know, we better handle it rather than
assert.

Bottom line: you convinced me msix_init() should stay as it is.  But now
msi_init() looks like it needs a change: it asserts on invalid
nr_vectors parameter.  Does that need fixing, Alex?

Re: [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error

Reply via email to