Alex Williamson <alex.william...@redhat.com> writes: > On Tue, 13 Sep 2016 08:16:20 +0200 > Markus Armbruster <arm...@redhat.com> wrote: > >> Cc: Alex for device assignment expertise. >> >> Cao jin <caoj.f...@cn.fujitsu.com> writes: >> >> > On 09/12/2016 09:29 PM, Markus Armbruster wrote: >> >> Cao jin <caoj.f...@cn.fujitsu.com> writes: >> >> >> >>> The input parameters is used for creating the msix capable device, so >> >>> they must obey the PCI spec, or else, it should be programming error. >> >> >> >> True when the the parameters come from a device model attempting to >> >> define a PCI device violating the spec. But what if the parameters come >> >> from an actual PCI device violating the spec, via device assignment? >> > >> > Before the patch, on invalid param, the vfio behaviour is: >> > error_report("vfio: msix_init failed"); >> > then, device create fail. >> > >> > After the patch, its behaviour is: >> > asserted. >> > >> > Do you mean we should still report some useful info to user on invalid >> > params? >> >> In the normal case, asking msix_init() to create MSI-X that are out of >> spec is a programming error: the code that does it is broken and needs >> fixing. >> >> Device assignment might be the exception: there, the parameters for >> msix_init() come from the assigned device, not the program. If they >> violate the spec, the device is broken. This wouldn't be a programming >> error. Alex, can this happen? >> >> If yes, we may want to handle it by failing device assignment. > > > Generally, I think the entire premise of these sorts of patches is > flawed. We take a working error path that allows a driver to robustly > abort on unexpected date and turn it into a time bomb. Often the > excuse for this is that "error handling is hard". Tough. Now a > hot-add of a device that triggers this changes from a simple failure to > a denial of service event. Furthermore, we base that time bomb on our > interpretation of the spec, which we can only validate against in-tree > devices. > > We have actually had assigned devices that fail the sanity test here, > there's a quirk in vfio_msix_early_setup() for a Chelsio device with > this bug. Do we really want user experiencing aborts when a simple > device initialization failure is sufficient? > > Generally abort code paths like this cause me to do my own sanity > testing, which is really poor practice since we should have that sanity > testing in the common code. Thanks,
I prefer to assert on programming error, because 1. it does double duty as documentation, 2. error handling of impossible conditions is commonly wrong, and 3. assertion failures have a much better chance to get the program fixed. Even when presence of a working error path kills 2., the other two make me stick to assertions. However, input out-of-spec is not a programming error. For most users of msix_init(), the arguments are hard-coded, thus invalid arguments are a programming error. For device assignment, they come from a physical device, thus invalid arguments can either be a programming error (our idea of "invalid" is invalid) or bad input (the physical device is out-of-spec). Since we can't know, we better handle it rather than assert. Bottom line: you convinced me msix_init() should stay as it is. But now msi_init() looks like it needs a change: it asserts on invalid nr_vectors parameter. Does that need fixing, Alex?