On Wed, Jan 13, 2021 at 1:06 PM Roger Pau Monné <roger....@citrix.com> wrote:
>
> On Wed, Jan 13, 2021 at 10:48:52AM -0500, Jason Andryuk wrote:
> > On Wed, Jan 13, 2021 at 8:11 AM Roger Pau Monné <roger....@citrix.com> 
> > wrote:
> > >
> > > On Wed, Jan 13, 2021 at 06:21:03AM +0000, Tian, Kevin wrote:
> > > > > From: Roger Pau Monne <roger....@citrix.com>
> > > > > Sent: Wednesday, January 13, 2021 1:33 AM
> > > > >
> > > > > Current interrupt pass though code will setup a timer for each
> > > > > interrupt injected to the guest that requires an EOI from the guest.
> > > > > Such timer would perform two actions if the guest doesn't EOI the
> > > > > interrupt before a given period of time. The first one is deasserting
> > > > > the virtual line, the second is perform an EOI of the physical
> > > > > interrupt source if it requires such.
> > > > >
> > > > > The deasserting of the guest virtual line is wrong, since it messes
> > > > > with the interrupt status of the guest. It's not clear why this was
> > > > > odne in the first place, it should be the guest the one to EOI the
> > > > > interrupt and thus deassert the line.
> > > > >
> > > > > Performing an EOI of the physical interrupt source is redundant, since
> > > > > there's already a timer that takes care of this for all interrupts,
> > > > > not just the HVM dpci ones, see irq_guest_action_t struct eoi_timer
> > > > > field.
> > > > >
> > > > > Since both of the actions performed by the dpci timer are not
> > > > > required, remove it altogether.
> > > > >
> > > > > Signed-off-by: Roger Pau Monné <roger....@citrix.com>
> > > > > ---
> > > > > As with previous patches, I'm having a hard time figuring out why this
> > > > > was required in the first place. I see no reason for Xen to be
> > > > > deasserting the guest virtual line. There's a comment:
> > > > >
> > > > > /*
> > > > >  * Set a timer to see if the guest can finish the interrupt or not. 
> > > > > For
> > > > >  * example, the guest OS may unmask the PIC during boot, before the
> > > > >  * guest driver is loaded. hvm_pci_intx_assert() may succeed, but the
> > > > >  * guest will never deal with the irq, then the physical interrupt 
> > > > > line
> > > > >  * will never be deasserted.
> > > > >  */
> > > > >
> > > > > Did this happen because the device was passed through in a bogus state
> > > > > where it would generate interrupts without the guest requesting
> > > >
> > > > It could be a case where two devices share the same interrupt line and
> > > > are assigned to different domains. In this case, the interrupt activity 
> > > > of
> > > > two devices interfere with each other.
> > >
> > > This would also seem to be problematic if the device decides to use
> > > MSI instead of INTx, but due to the shared nature of the INTx line we
> > > would continue to inject INTx (generated from another device not
> > > passed through to the guest) to the guest even if the device has
> > > switched to MSI.
> > >
> > > > >
> > > > > Won't the guest face the same issues when booted on bare metal, and
> > > > > thus would already have the means to deal with such issues?
> > > >
> > > > The original commit was added by me in ~13yrs ago (0f843ba00c95)
> > > > when enabling Xen in a client virtualization environment where interrupt
> > > > sharing is popular.
> > >
> > > Thanks, the reference to the above commit is helpful, I wasn't able to
> > > find it and it contains a comment that I think has been lost, which
> > > provides the background on why this was added.
> > >
> > > > I believe above comment was recorded for a real
> > > > problem at the moment (deassert resets the intx line to unblock further
> > > > interrupts). But I'm not sure whether it is still the case after both 
> > > > Xen and
> > > > guest OS have changed a lot. At least some test from people who
> > > > still use Xen in shared interrupt scenario would be helpful. Or, if such
> > > > usage is already niche, maybe we can consider disallow passing through
> > > > devices which share the same interrupt line to different domains and
> > > > then safely remove this dpci EOI trick.
> > >
> > > So the deassert done by timeout only deasserts the virtual line, but
> > > doesn't for example clear the IRR bit from the vIO-APIC pin, which
> > > will cause further interrupts to not be delivered anyway until a
> > > proper EOI (or a switch to trigger mode) is done to the pin.
> > >
> > > I think it's going to be complicated for me to find a system that has
> > > two devices I can passthrough sharing the same GSI.
> >
> > I have some laptops running OpenXT where the USB controller and NIC
> > share an interrupt, and I assign them to different domains.  Qubes
> > would hit this as well.
>
> Is there any chance you could try the patch and see if you can hit the
> issue it was trying to fix?

Would testing a backport to 4.12 be useful?  There were some file
renames, but it looks to apply.  The only difference is the 4.12
hvm_pirq_eoi hunk still has `(ent && ent->fields.mask) || `.  Maybe
backport commit eb298f32fac5ac "x86/dpci: EOI interrupt regardless of
its masking status" as well?

Testing staging would take a little longer to make a machine available.

I guess I'd also need to disable MSI for the two devices to ensure
they are both using the GSI?

Regards,
Jason

Reply via email to