On Wed, Dec 02, 2015 at 01:09:34PM -0700, Alex Williamson wrote: > On Tue, 2015-12-01 at 13:23 +1100, David Gibson wrote: > > On Mon, Nov 23, 2015 at 02:58:11PM -0700, Alex Williamson wrote: > > > On Thu, 2015-11-19 at 15:29 +1100, David Gibson wrote: > > > > At present the code handling IBM's Enhanced Error Handling (EEH) > > > > interface > > > > on VFIO devices operates by bypassing the usual VFIO logic with > > > > vfio_container_ioctl(). That's a poorly designed interface with unclear > > > > semantics about exactly what can be operated on. > > > > > > > > In particular it operates on a single vfio container internally (hence > > > > the > > > > name), but takes an address space and group id, from which it deduces > > > > the > > > > container in a rather roundabout way. groupids are something that code > > > > outside vfio shouldn't even be aware of. > > > > > > > > This patch creates new interfaces for EEH operations. Internally we > > > > have vfio_eeh_container_op() which takes a VFIOContainer object > > > > directly. For external use we have vfio_eeh_as_ok() which determines > > > > if an AddressSpace is usable for EEH (at present this means it has a > > > > single container and at most a single group attached), and > > > > vfio_eeh_as_op() which will perform an operation on an AddressSpace in > > > > the unambiguous case, and otherwise returns an error. > > > > > > > > This interface still isn't great, but it's enough of an improvement to > > > > allow a number of cleanups in other places. > > > > > > > > Signed-off-by: David Gibson <da...@gibson.dropbear.id.au> > > > > --- > > > > hw/vfio/common.c | 77 > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > include/hw/vfio/vfio.h | 2 ++ > > > > 2 files changed, 79 insertions(+) > > > > > > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > > > > index 6797208..4733625 100644 > > > > --- a/hw/vfio/common.c > > > > +++ b/hw/vfio/common.c > > > > @@ -1002,3 +1002,80 @@ int vfio_container_ioctl(AddressSpace *as, > > > > int32_t groupid, > > > > > > > > return vfio_container_do_ioctl(as, groupid, req, param); > > > > } > > > > + > > > > +/* > > > > + * Interfaces for IBM EEH (Enhanced Error Handling) > > > > + */ > > > > +static bool vfio_eeh_container_ok(VFIOContainer *container) > > > > +{ > > > > + /* A broken kernel implementation means EEH operations won't work > > > > + * correctly if there are multiple groups in a container */ > > > > + > > > > + if (!QLIST_EMPTY(&container->group_list) > > > > + && QLIST_NEXT(QLIST_FIRST(&container->group_list), > > > > container_next)) { > > > > + return false; > > > > + } > > > > + > > > > + return true; > > > > +} > > > > > > Seems like there are ways to make this a non-eeh specific function, > > > vfio_container_group_count(), vfio_container_group_empty_or_singleton(), > > > etc. > > > > I guess, but I don't know of anything else that needs to know, so is > > there a point? > > Yes, long term maintainability. Simple functions that are named based > on what they do are building blocks for other users, even if we don't > yet know they exist. Functions tainted with the name and purpose of > their currently intended callers are cruft and code duplication waiting > to happen.
Ok, point taken. > > Actually.. I could do with a another opinion here: so, logically EEH > > operations should be possible on a container basis - the kernel > > interface correctly reflects that (my previous comments that the > > interface was broken were mistaken). > > > > The current kernel implementation *is* broken (and is non-trivial to > > fix) which is what this test is about. But is checking for a probably > > broken kernel state something that we ought to be checking for in > > qemu? As it stands when the kernel is fixed we'll need a new > > capability so that qemu can know to disable this test. > > > > Should we instead just proceed with any container and just advise > > people not to attach multiple groups until the kernel is fixed? > > > > A relevant point here might be that while I haven't implemented it so > > far, I think it will be possible to workaround the broken kernel with > > full functionality by forcing each group into a separate container and > > using one of a couple of possible different methods to handle EEH > > functionality across multiple containers on a vPHB. > > This sounds vaguely similar to the discussions we're having around AER > handling. We really need to be able to translate a guest bus reset to a > host bus reset to enable guest participation in AER recovery, but iommu > grouping doesn't encompass any sort of shared bus property on x86 like > it does on power. Therefore the configurations where we can enable AER > are only a subset of what we can enable otherwise. However, not > everyone cares about AER recovery and perhaps the same is true of EEH. > So you really don't want to prevent useful configurations if the user > doesn't opt-in for that feature. > > So for AER we're thinking about a new vfio-pci option, aer=on, that > indicates the device must be in a configuration that supports AER or the > VM instantiation (or device hotplug) should fail. Should EEH do > something similar? Yes, I think that's a good idea. I'd been thinking about a PHB option for enabling EEH, but I think one on the devices themselves makes things work better. > Should we share an option to make life easier for > libvirt so it doesn't need to care about EEH vs AER? My initial thought is yes, but I'm not really sure if there are wrinkles that could make that a problem. > If the kernel > interface is eventually fixed, maybe that just relaxes some of the > configuration parameters making EEH support easier to achieve, but still > optional? Thanks, So, yes, and that's good, but that's not really what I was asking about. The kernel *interface* is not broken, just the implementation. Which means when it's fixed it won't be discoverable unless we also add a capability advertising the fix. So the question is: do we workaround in qemu until such a capability comes along, or just assume that it's (potentially) working and declare it a kernel problem if it doesn't? -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature