On Thu, 19 Mar 2026 19:04:37 +0000
David Matlack <[email protected]> wrote:

> On 2026-03-17 02:42 PM, Rubin Du wrote:
> > Add a new VFIO PCI driver for NVIDIA GPUs that enables DMA testing
> > via the Falcon (Fast Logic Controller) microcontrollers. This driver
> > extracts and adapts the DMA test functionality from the NVIDIA
> > gpu-admin-tools project and integrates it into the existing VFIO
> > selftest framework.
> > 
> > The Falcon is a general-purpose microcontroller present on NVIDIA GPUs
> > that can perform DMA operations between system memory and device memory.
> > By leveraging Falcon DMA, this driver allows NVIDIA GPUs to be tested
> > alongside Intel IOAT and DSA devices using the same selftest infrastructure.
> > 
> > Supported GPUs:
> > - Kepler: K520, GTX660, K4000, K80, GT635
> > - Maxwell Gen1: GTX750, GTX745
> > - Maxwell Gen2: M60
> > - Pascal: P100, P4, P40
> > - Volta: V100
> > - Turing: T4
> > - Ampere: A16, A100, A10
> > - Ada: L4, L40S
> > - Hopper: H100
> > 
> > The PMU falcon on Kepler and Maxwell Gen1 GPUs uses legacy FBIF register
> > offsets and requires enabling via PMC_ENABLE with the HUB bit set.
> > 
> > Limitations and tradeoffs:
> > 
> > 1. Architecture support:
> >    Blackwell and newer architectures may require additional work
> >    due to firmware.
> > 
> > 2. Synchronous DMA operations:
> >    Each transfer blocks until completion because the reference
> >    implementation does not expose command queuing - only one
> >    DMA operation can be in flight at a time.  
> 
> Asynchronous DMA will be important for testing Live Update:
> 
>   https://lore.kernel.org/kvm/[email protected]/
> 
> That is why I split memcpy_start() and memcpy_wait() from the beginning.
> 
> Would it be possible to add support for it here even though it is not in
> the reference implementation?

I'll leave the can-we questions to Rubin, but do you see either the MSI
or asynchronous issues as blockers?  Currently our driver tests are
limited to a very narrow range of Intel server platforms, whereas this
is a plug'able endpoint we can install anywhere.  I'd think that's
sufficiently valuable in expanding the test base to make some
compromises.  Thanks,

Alex

Reply via email to