Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 2/3] VFIO: Disable INTx interrupt on EEH reset

Gavin Shan Thu, 19 Mar 2015 23:00:08 -0700

On Fri, Mar 20, 2015 at 03:01:43PM +1100, Gavin Shan wrote:
>On Wed, Mar 18, 2015 at 03:54:09PM +1100, Gavin Shan wrote:
>>On Tue, Mar 17, 2015 at 03:16:46PM -0600, Alex Williamson wrote:
>>>On Tue, 2015-03-17 at 03:31 +1100, Gavin Shan wrote:
>>>> When Linux guest recovers from EEH error on the following Emulex
>>>> adapter, the MSIx interrupts are disabled and the INTx emulation
>>>> is enabled. One INTx interrupt is injected to the guest by host
>>>> because of detected pending INTx interrupts on the adapter. QEMU
>>>> disables mmap'ed BAR regions and starts a timer to enable those
>>>> regions at later point the INTx interrupt handler. Unfortunately,
>>>> "VFIOPCIDevice->intx.pending" isn't cleared, meaning those disabled
>>>> mapp'ed BAR regions won't be reenabled properly. It leads to EEH
>>>> recovery failure at guest side because of hanged MMIO access.
>>>> 
>>>>  # lspci | grep Emulex
>>>>  0000:01:00.0 Ethernet controller: Emulex Corporation \
>>>>               OneConnect 10Gb NIC (be3) (rev 02)
>>>>  0000:01:00.1 Ethernet controller: Emulex Corporation \
>>>>               OneConnect 10Gb NIC (be3) (rev 02)
>>>> 
>>>> The patch disables INTx interrupt before doing EEH reset to avoid
>>>> the issue.
>>>> 
>>>> Signed-off-by: Gavin Shan <gws...@linux.vnet.ibm.com>
>>>> ---
>>>>  hw/vfio/pci.c | 13 +++++++++++++
>>>>  1 file changed, 13 insertions(+)
>>>> 
>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>> index fca1edc..bfa3d0c 100644
>>>> --- a/hw/vfio/pci.c
>>>> +++ b/hw/vfio/pci.c
>>>> @@ -3340,6 +3340,14 @@ int vfio_container_eeh_event(AddressSpace *as, 
>>>> int32_t groupid,
>>>>           * disable it so that it can be reenabled properly. Also,
>>>>           * the cached MSIx table should be cleared as it's not
>>>>           * reflecting the contents in hardware.
>>>> +         *
>>>> +         * We might have INTx interrupt whose handler disables the
>>>> +         * memory mapped BARs. The INTx pending state can't be
>>>> +         * cleared with memory BAR access in slow path. The timer
>>>> +         * kicked by the INTx interrupt handler won't enable those
>>>> +         * disabled memory mapped BARs, which leads to hanged MMIO
>>>> +         * register access and EEH recovery failure. We simply disable
>>>> +         * INTx if it has been enabled.
>>>>           */
>>>
>>>This feels like a quick hack for a problem we don't really understand.
>>>Why is INTx being fired through QEMU rather than KVM?  Why isn't the
>>>INTx re-enabling happening since this is exactly the scenario where it's
>>>supposed to work (ie. INTx occurs, BAR mmap disabled, guest accesses
>>>BAR, mmap re-enabled, INTx unmasked)?
>>>
>>
>>Indeed. It's a quick hack before finding the root cause about why slow
>>path doesn't work when fast path is disabled. I'm still tracing it and
>>hopefully I can find something soon. Note that: KVM IRQFD isn't enabled
>>on the system I was doing experiments.
>>
>
>I'm not sure if there're further replies to this thread since then. My
>mailbox is broken for quite a while. Also, I don't know how Ben is missed
>from the cc list. Add him who might have more thoughts. Sorry for mail 
>flooding to you, Ben :-)
>
>The root cause of the problem would be specific to how MMIO regions are
>supported on KVM for book3s_hv, where invalid page table entries (PTEs)
>are built to trace the address mapping between guest virtual address
>and host physical address. Since they're marked as "invalid", accessing
>to the MMIO region will cause page fault in host kernel and then the
>PTE table is searched to locate the address mapping, the access is
>propogated to QEMU for emulation if the address mapping exits in PTE
>table. It's how the "slow path" works.
>
>For the "fast path", KVM creates memory slot and update PTE table with
>the mappings upon page fault. Those PTEs are "valid" and seen by CPU
>to do address translation automatically. When the memory region for
>the "fast path" is disabled from QEMU, the host will purge TLB/PTEs
>for the mappings. So when the "fast path" regions are disabled, the
>PTE table doesn't have the entries required for "slow path" access
>any more. At later point, guest tries access to the region and the
>host just doesn't know how to handle the case.
>
>I already have couple of patches to introduce addtional flag to
>identify the "fast path" region, which has properties that RAM
>region and MMIO region have. Once the region is disabled, host
>purges its mappings in PTEs/TLB, and then update PTEs with the
>mappings required by "slow path" region. With that, I didn't see
>the problem. Both host/QEMU needs changes. I'll remove this patch
>from the series and send out a new revision. The patches to address
>the "fast path" issue will be sent separately after checking with
>Ben or Paul they're worthy to be sent out.
>


Checked with Paul Mackerras. He already has patch fixing the issue
in an different way :-)

Thanks,
Gavin

>
>>
>>>>          QLIST_FOREACH(vbasedev, &group->device_list, next) {
>>>>              vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>>>> @@ -3348,6 +3356,11 @@ int vfio_container_eeh_event(AddressSpace *as, 
>>>> int32_t groupid,
>>>>              }
>>>>  
>>>>              msix_reset(&vdev->pdev);
>>>> +
>>>> +            /* Disable INTx */
>>>> +            if (vdev->interrupt == VFIO_INT_INTx) {
>>>> +                vfio_disable_intx(vdev);
>>>> +            }
>>>>          }
>>>>  
>>>>          break;
>>>
>>>
>>>
>>>

Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 2/3] VFIO: Disable INTx interrupt on EEH reset

Reply via email to