Re: [Qemu-devel] [PATCH v3 3/3] vfio-pci: process non fatal error of AER

Alex Williamson Tue, 28 Mar 2017 19:56:10 -0700

On Wed, 29 Mar 2017 02:59:34 +0300
"Michael S. Tsirkin" <m...@redhat.com> wrote:


> On Tue, Mar 28, 2017 at 10:12:25AM -0600, Alex Williamson wrote:
> > On Tue, 28 Mar 2017 21:49:17 +0800
> > Cao jin <caoj.f...@cn.fujitsu.com> wrote:
> >   
> > > On 03/25/2017 06:12 AM, Alex Williamson wrote:  
> > > > On Thu, 23 Mar 2017 17:09:23 +0800
> > > > Cao jin <caoj.f...@cn.fujitsu.com> wrote:
> > > >     
> > > >> Make use of the non fatal error eventfd that the kernel module provide
> > > >> to process the AER non fatal error. Fatal error still goes into the
> > > >> legacy way which results in VM stop.
> > > >>
> > > >> Register the handler, wait for notification. Construct aer message and
> > > >> pass it to root port on notification. Root port will trigger an 
> > > >> interrupt
> > > >> to signal guest, then guest driver will do the recovery.    
> > > > 
> > > > Can we guarantee this is the better solution in all cases or could
> > > > there be guests without AER support where the VM stop is the better
> > > > solution?
> > > >     
> > > 
> > > Currently, we only have VM stop on errors, that looks the same as a
> > > sudden power down to me.  With this solution, we have about
> > > 50%(non-fatal) chance to reduce the sudden power-down risk.  
> > 
> > If half of all faults are expected to be non-fatal, then you must have
> > some real examples of devices triggering non-fatal errors which can be
> > corrected in the guest driver that you can share to justify why it's a
> > good thing to enable this behavior.
> >   
> > > What if a guest doesn't support AER?  It looks the same as a host
> > > without AER support. Now I only can speculate the worst condition: guest
> > > crash, would that be quite different from a sudden power-down?  
> > 
> > Yes, it's very different.  In one case we contain the fault by stopping
> > the guest, in the other case we allow the guest to continue operating
> > with a known fault in the device which may allow the fault to propagate
> > and perhaps go unnoticed.  We have established with the current
> > behavior that QEMU will prevent further propagation of a fault by
> > halting the VM.  To change QEMU's behavior here risks that a VM relying
> > on that behavior no longer has that protection.  So it seems we either
> > need to detect whether the VM is handling AER or we need to require the
> > VM administrator to opt-in to this new feature.  
> 
> An opt-in flag sounds very reasonable. It can also specify whether
> to log the errors. We have a similar flag for disk errors.

An opt-in works, but is rather burdensome to the user.
 
> >  Real hardware has
> > these same issues and I believe there are handshakes that can be done
> > through ACPI to allow the guest to take over error handling from the
> > system.  
> 
> No, that's only for error reporting IIUC. Driver needs to be
> aware of a chance for errors to trigger and be able to
> handle them.

See drivers/acpi/pci_root.c:negotiate_os_control(), it seems that the
OSPM uses an _OSC to tell ACPI via OSC_PCI_EXPRESS_AER_CONTROL.  Would
that not be a reasonable mechanism for the guest to indicate AER
support?

> So yes, some guests might have benefitted from VM stop
> on AER but
> 1. the stop happens asynchronously so if guest can't handle
>    errors there's a chance it is already crashed by the time we
>    try to do vm stop

I fully concede that it's asynchronous, bad data can propagate and a
guest crash is one potential outcome.  That's fine, a guest crash
indicates a problem.  A VM stop also indicates a problem.  Potential
lack of a crash or VM stop is the worrisome case.

> 2. it's more of a chance by-product - we never promised
>    guests that VMs would be more robust than bare metal

Does that make it not a regression if we change the behavior?  I
wouldn't exactly call it a chance by-product, perhaps it wasn't the
primary motivation, but it was considered.  Thanks,

Alex

> > > >> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
> > > >> Signed-off-by: Cao jin <caoj.f...@cn.fujitsu.com>
> > > >> ---
> > > >>  hw/vfio/pci.c              | 202 
> > > >> +++++++++++++++++++++++++++++++++++++++++++++
> > > >>  hw/vfio/pci.h              |   2 +
> > > >>  linux-headers/linux/vfio.h |   2 +
> > > >>  3 files changed, 206 insertions(+)
> > > >>
> > > >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > >> index 3d0d005..c6786d5 100644
> > > >> --- a/hw/vfio/pci.c
> > > >> +++ b/hw/vfio/pci.c
> > > >> @@ -2432,6 +2432,200 @@ static void vfio_put_device(VFIOPCIDevice 
> > > >> *vdev)
> > > >>      vfio_put_base_device(&vdev->vbasedev);
> > > >>  }
> > > >>  
> > > >> +static void vfio_non_fatal_err_notifier_handler(void *opaque)
> > > >> +{
> > > >> +    VFIOPCIDevice *vdev = opaque;
> > > >> +    PCIDevice *dev = &vdev->pdev;
> > > >> +    PCIEAERMsg msg = {
> > > >> +        .severity = PCI_ERR_ROOT_CMD_NONFATAL_EN,
> > > >> +        .source_id = pci_requester_id(dev),
> > > >> +    };
> > > >> +
> > > >> +    if 
> > > >> (!event_notifier_test_and_clear(&vdev->non_fatal_err_notifier)) {
> > > >> +        return;
> > > >> +    }
> > > >> +
> > > >> +    /* Populate the aer msg and send it to root port */
> > > >> +    if (dev->exp.aer_cap) {    
> > > > 
> > > > Why would we have registered this notifier otherwise?
> > > >     
> > > >> +        uint8_t *aer_cap = dev->config + dev->exp.aer_cap;
> > > >> +        uint32_t uncor_status;
> > > >> +        bool isfatal;
> > > >> +
> > > >> +        uncor_status = vfio_pci_read_config(dev,
> > > >> +                            dev->exp.aer_cap + PCI_ERR_UNCOR_STATUS, 
> > > >> 4);
> > > >> +        if (!uncor_status) {
> > > >> +            return;
> > > >> +        }
> > > >> +
> > > >> +        isfatal = uncor_status & pci_get_long(aer_cap + 
> > > >> PCI_ERR_UNCOR_SEVER);
> > > >> +        if (isfatal) {
> > > >> +            goto stop;
> > > >> +        }    
> > > > 
> > > > Huh?  How can we get a non-fatal error notice for a fatal error?  (and
> > > > why are we saving this to a variable rather than testing it within the
> > > > 'if' condition?
> > > >    
> > > 
> > > Both of these are for the unsure corner cases.
> > > Is it possible that register reading shows a fatal error?
> > > Saving it into a variable just is personal taste: more neat.  
> > 
> > Why are there unsure corner cases?  Shouldn't the kernel have done this
> > check if there was any doubt whether the error was fatal or not?
> > Signaling the user with a non-fatal trigger for a fatal error certainly
> > doesn't make me have much confidence in this code.
> >   
> > > >> +
> > > >> +        error_report("%s sending non fatal event to root port. uncor 
> > > >> status = "
> > > >> +                     "0x%"PRIx32, vdev->vbasedev.name, uncor_status);
> > > >> +        pcie_aer_msg(dev, &msg);
> > > >> +        return;
> > > >> +    }
> > > >> +
> > > >> +stop:
> > > >> +    /* Terminate the guest in case of fatal error */
> > > >> +    error_report("%s: Device detected a fatal error. VM stopped",
> > > >> +          vdev->vbasedev.name);
> > > >> +    vm_stop(RUN_STATE_INTERNAL_ERROR);    
> > > > 
> > > > Shouldn't we use the existing error index if we can't make use of
> > > > correctable errors?
> > > >     
> > > 
> > > Why? If register reading shows it is actually a fatal error, is it the
> > > same as fatal error handler is notified?  what we use the existing error
> > > index for?  
> > 
> > See below.
> >   
> > > >> @@ -2860,6 +3054,8 @@ static void vfio_realize(PCIDevice *pdev, Error 
> > > >> **errp)
> > > >>          }
> > > >>      }
> > > >>  
> > > >> +    vfio_register_passive_reset_notifier(vdev);
> > > >> +    vfio_register_non_fatal_err_notifier(vdev);    
> > > > 
> > > > I think it's wrong that we configure these unconditionally.  Why do we
> > > > care about these unless we're configuring the guest to receive AER
> > > > events?
> > > >     
> > > 
> > > But do we have ways to know whether the guest has AER support? For now,
> > > I don't think so.  
> > 
> > By unconditionally here, I'm referring to not even testing whether the
> > device is in a VM topology where it can receive AER events.  If we
> > cannot signal the device, we don't need these additional triggers and
> > therefore we don't need the aer_cap test in the non-fatal error
> > handler, we can use the existing error index instead.  Enabling these
> > triggers at the point where the guest takes over error handling from
> > the system would be even better.
> >    
> > > If guest don't have AER support, for the worst condition: guest crash,
> > > it is not worse than a sudden power-down.  
> > 
> > Worst case is that a non-fatal error introduces a data corruption which
> > was previously noted via a VM stop (even if asynchronous notification
> > allowed some propagation) and now potentially goes unnoticed.  That's a
> > very big difference.  Thanks,
> > 
> > Alex

Re: [Qemu-devel] [PATCH v3 3/3] vfio-pci: process non fatal error of AER

Reply via email to