On Thu, May 14, 2026 at 9:28 AM Alex Williamson <[email protected]> wrote:
>
> On Wed, 13 May 2026 23:33:02 +0000
> David Matlack <[email protected]> wrote:
>
> > On 2026-05-13 11:49 AM, Josh Hilke wrote:
> > > On Tue, May 12, 2026 at 7:12 PM Josh Hilke <[email protected]> wrote:
> > > > On Mon, May 11, 2026 at 4:45 PM David Matlack <[email protected]> 
> > > > wrote:
> > > > > On 2026-05-11 09:18 PM, Josh Hilke wrote:
> >
> > > > > > +     retries = 100;
> > > > > > +     while (retries-- > 0) {
> > > > > > +             if (rx->wb.status_error & 1)
> > > > > > +                     break;
> > > > > > +             usleep(10);
> > > > > > +     }
> > > > >
> > > > > Why bail after a certain timeout? The test may have kicked off a large
> > > > > count of memcpys. Is this for error detection?
> > > >
> > > > The bailout was intended to detect errors during development.
> > > > Shouldn't need it anymore. I'll remove it in v2.
> > >
> > > Sorry, I forgot: we need the timeout  to detect DMA errors for the
> > > memcpy_from_unmapped_iova test in vfio_pci_driver_test. The test
> > > triggers an IOMMU fault because the IOVA is unmapped, and the IOMMU
> > > aborts the DMA operation. However, the QEMU IGB implementation does
> > > not set an error bit, so timing out is our only method for error
> > > detection.
> >
> > Hm... that's going to be tricky then. This means we would have to set
> > the timeout to longer than the longest possible memcpy duration to avoid
> > false negatives? That means we'll have to set the timeout to quite long.
>
> FWIW, I had AI churn on trying to make this work on a physical 82576 as
> I have several of these in my local machines as sort of the defacto,
> readily available SR-IOV NIC.  The AI got up to 30/35 tests passing but
> is currently stuck that the queues stall in the mix-and-match test when
> it's trying to DMA from an unmapped IOVA.  So far none of the in-band
> methods to kick the queues seem to work, I'm not sure if we'll need to
> resort to an FLR.
>
> I'd be happy to send the changes it's made so far if you want to
> validate and incorporate, or have any thoughts to kicking it after the
> IOMMU fault.  Some of the changes are related to timeouts, where QEMU
> loopback is actually faster than bare metal since the physical  queues
> run at 1Gbps even in loopback mode.
>
> I'll also plant the seed that if we do have outstanding issues for a
> driver that binds to a real world device, but only works on the
> emulated version of that device... how do we handle that?  In part, I
> think it's emulated in QEMU because it is so ubiquitous.  I'm also
> hoping to use the same device for the new SR-IOV selftests.  Thanks,
>
> Alex

I'm glad you're interested in this as well!

Unfortunately (and ironically) I don't have access to a physical device.

Regarding driver support for the real vs. emulated device, I think we
should prioritize supporting the emulated version. This approach
unlocks the ability for anyone to run VFIO/Live Update tests without
needing specific hardware. Once that's done, other folks can add
patches to update the driver if they want to use the physical device.
What do you think?

I plan to add SR-IOV support to this driver in a separate series after
the driver merges.

Reply via email to