On Mon, Jan 29, 2024 at 05:38:55PM +0100, Eric Auger wrote: > > There may be a separate argument for clearing bypass. With a coldplugged > > VFIO device the flow is: > > > > 1. Map the whole guest address space in VFIO to implement boot-bypass. > > This allocates all guest pages, which takes a while and is wasteful. > > I've actually crashed a host that way, when spawning a guest with too > > much RAM. > interesting > > 2. Start the VM > > 3. When the virtio-iommu driver attaches a (non-identity) domain to the > > assigned endpoint, then unmap the whole address space in VFIO, and most > > pages are given back to the host. > > > > We can't disable boot-bypass because the BIOS needs it. But instead the > > flow could be: > > > > 1. Start the VM, with only the virtual endpoints. Nothing to pin. > > 2. The virtio-iommu driver disables bypass during boot > We needed this boot-bypass mode for booting with virtio-blk-scsi > protected with virtio-iommu for instance. > That was needed because we don't have any virtio-iommu driver in edk2 as > opposed to intel iommu driver, right?
Yes. What I had in mind is the x86 SeaBIOS which doesn't have any IOMMU driver and accesses the default SATA device: $ qemu-system-x86_64 -M q35 -device virtio-iommu,boot-bypass=off qemu: virtio_iommu_translate sid=250 is not known!! qemu: no buffer available in event queue to report event qemu: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address But it's the same problem with edk2. Also a guest OS without a virtio-iommu driver needs boot-bypass. Once firmware boot is complete, the OS with a virtio-iommu driver normally can turn bypass off in the config space, it's not useful anymore. If it needs to put some endpoints in bypass, then it can attach them to a bypass domain. > > 3. Hotplug the VFIO device. With bypass disabled there is no need to pin > > the whole guest address space, unless the guest explicitly asks for an > > identity domain. > > > > However, I don't know if this is a realistic scenario that will actually > > be used. > > > > By the way, do you have an easy way to reproduce the issue described here? > > I've had to enable iommu.forcedac=1 on the command-line, otherwise Linux > > just allocates 32-bit IOVAs. > I don't have a simple generic reproducer. It happens when assigning this > device: > Ethernet Controller E810-C for QSFP (Ethernet Network Adapter E810-C-Q2) > > I have not encountered that issue with another device yet. > I see on guest side in dmesg: > [ 6.849292] ice 0000:00:05.0: Using 64-bit DMA addresses > > That's emitted in dma-iommu.c iommu_dma_alloc_iova(). > Looks like the guest first tries to allocate an iova in the 32-bit AS > and if this fails use the whole dma_limit. > Seems the 32b IOVA alloc failed here ;-) Interesting, are you running some demanding workload and a lot of CPUs? That's a lot of IOVAs used up, I'm curious about what kind of DMA pattern does that. Thanks, Jean