On Wed, 12 Mar 2025 23:50:02 +0800
Tomita Moeko <tomitamo...@gmail.com> wrote:

> A previous change made the OpRegion and LPC quirks independent of the
> exising legacy mode, update the docoumentation accordingly. More related
> topics, like creating EFI Option ROM of IGD for OVMF, how to solve the
> VFIO_DMA_MAP Invalid Argument warning, as well as details on IGD memory
> internals, are also added.
> 
> Signed-off-by: Tomita Moeko <tomitamo...@gmail.com>
> ---
>  docs/igd-assign.txt | 262 ++++++++++++++++++++++++++++++++------------
>  1 file changed, 193 insertions(+), 69 deletions(-)
> 
> diff --git a/docs/igd-assign.txt b/docs/igd-assign.txt
> index e17bb50789..c7c4565906 100644
> --- a/docs/igd-assign.txt
> +++ b/docs/igd-assign.txt
> @@ -1,44 +1,69 @@
>  Intel Graphics Device (IGD) assignment with vfio-pci
>  ====================================================
>  
> -IGD has two different modes for assignment using vfio-pci:
> -
> -1) Universal Pass-Through (UPT) mode:
> -
> -   In this mode the IGD device is added as a *secondary* (ie. non-primary)
> -   graphics device in combination with an emulated primary graphics device.
> -   This mode *requires* guest driver support to remove the external
> -   dependencies generally associated with IGD (see below).  Those guest
> -   drivers only support this mode for Broadwell and newer IGD, according to
> -   Intel.  Additionally, this mode by default, and as officially supported
> -   by Intel, does not support direct video output.  The intention is to use
> -   this mode either to provide hardware acceleration to the emulated graphics
> -   or to use this mode in combination with guest-based remote access 
> software,
> -   for example VNC (see below for optional output support).  This mode
> -   theoretically has no device specific handling dependencies on vfio-pci or
> -   the VM firmware.
> -
> -2) "Legacy" mode:
> -
> -   In this mode the IGD device is intended to be the primary and exclusive
> -   graphics device in the VM[1], as such QEMU does not facilitate any sort
> -   of remote graphics to the VM in this mode.  A connected physical monitor
> -   is the intended output device for IGD.  This mode includes several
> -   requirements and restrictions:
> -
> -    * IGD must be given address 02.0 on the PCI root bus in the VM
> -    * The host kernel must support vfio extensions for IGD (v4.6)
> -    * vfio VGA support very likely needs to be enabled in the host kernel
> -    * The VM firmware must support specific fw_cfg enablers for IGD
> -    * The VM machine type must support a PCI host bridge at 00.0 (standard)
> -    * The VM machine type must provide or allow to be created a special
> -      ISA/LPC bridge device (vfio-pci-igd-lpc-bridge) on the root bus at
> -      PCI address 1f.0.
> -    * The IGD device must have a VGA ROM, either provided via the romfile
> -      option or loaded automatically through vfio (standard).  rombar=0
> -      will disable legacy mode support.
> -    * Hotplug of the IGD device is not supported.
> -    * The IGD device must be a SandyBridge or newer model device.
> +Using vfio-pci, we can passthrough Intel Graphics Device (IGD) to guest, 
> either
> +serve as primary and exclusive graphics adapter, or used in combination with 
> an
> +emulated primary graphics device, depending on the config and guest driver
> +support. However, IGD devices are not "clean" PCI devices, they use extra
> +memory regions other than BARs. Special handling is required to make them 
> work
> +properly, including:
> +
> +* OpRegion for accessing Virtual BIOS Table (VBT) that contains display 
> output
> +  information.
> +* Data Stolen Memory (DSM) region used as VRAM at early stage (BIOS/UEFI)
> +
> +Certain guest software also depends on following conditions to work:
> +(*-Required by)
> +
> +| Condition                                   | Linux | Windows | VBIOS | 
> EFI GOP |
> +|---------------------------------------------|-------|---------|-------|---------|
> +| #1 IGD has a valid OpRegion containing VBT  |  * ^1 |    *    |   *   |    
> *    |
> +| #2 VID/DID of LPC bridge at 00:1f.0 matches |       |         |   *   |    
> *    |
> +| #3 IGD is assigned to BDF 00:02.0           |       |         |   *   |    
> *    |
> +| #4 IGD has VGA controller device class      |       |         |   *   |    
> *    |
> +| #5 Host's VGA ranges are mapped to IGD      |       |         |   *   |    
>      |
> +| #6 Guest has valid VBIOS or UEFI Option ROM |       |         |   *   |    
> *    |
> +
> +^1 Though i915 driver is able to mock a OpRegion, it is still recommended to
> +   use the VBT copied from host OpRegion to prevent incorrect configuration.
> +
> +For #1, the "x-igd-opregion=on" option exposes a copy of host IGD OpRegion to
> +guest via fw_cfg, where guest firmware can set up guest OpRegion with it.
> +
> +For #2, "x-igd-lpc=on" option copies the IDs of host LPC bridge and host 
> bridge
> +to guest. Currently this is only supported on i440fx machines as there is
> +already an ICH9 LPC bridge present on q35 machines, overwriting its IDs may
> +lead to unexpected behavior.
> +
> +For #3, "addr=2.0" assigns IGD to 00:02.0.
> +
> +For #4, the primary display must be set to IGD in host BIOS.
> +
> +For #5, "x-vga=on" enables guest access to standard VGA IO/MMIO ranges.
> +
> +For #6, ROM either provided via the ROM BAR or romfile= option is needed, 
> this
> +Intel document [1] shows how to dump VBIOS to file. For UEFI Option ROM, see
> +"Guest firmware" section.
> +
> +QEMU also provides a "Legacy" mode that implicitly enables full functionality
> +on IGD, it is automatically enabled when
> +* Machine type is i440fx
> +* IGD is assigned to guest BDF 00:02.0
> +* ROM BAR or romfile is present
> +
> +In "Legacy" mode, QEMU will automatically setup OpRegion, LPC bridge IDs and
> +VGA range access, which is equivalent to:
> +  x-igd-opregion=on,x-igd-lpc=on,x-vga=on
> +
> +By default, "Legacy" mode won't fail, it continues on error. User can set
> +"x-igd-legacy-mode=on" to force enabling legacy mode, this also checks if the
> +conditions above for legacy mode is met, and if any error occurs, QEMU will
> +fail immediately. Users can also set "x-igd-legacy-mode=off" to disable 
> legacy
> +mode.
> +
> +In legacy mode, as the guest VGA ranges are assigned to IGD device, all other
> +graphics devices should be removed, this can be done using "-nographic" or
> +"-vga none" or "-nodefaults", along with adding the device using vfio-pci.
>  
>  For either mode, depending on the host kernel, the i915 driver in the host
>  may generate faults and errors upon re-binding to an IGD device after it
> @@ -73,31 +98,39 @@ DVI, or DisplayPort) may be unsupported in some use 
> cases.  In the author's
>  experience, even DP to VGA adapters can be troublesome while adapters between
>  digital formats work well.
>  
> -Usage
> -=====
> -The intention is for IGD assignment to be transparent for users and thus for
> -management tools like libvirt.  To make use of legacy mode, simply remove all
> -other graphics options and use "-nographic" and either "-vga none" or
> -"-nodefaults", along with adding the device using vfio-pci:
>  
> -    -device vfio-pci,host=00:02.0,id=hostdev0,bus=pci.0,addr=0x2
> +Options
> +=======
> +* x-igd-opregion=[on|*off*]
> +  Copy host IGD OpRegion and expose it to guest with fw_cfg
> +
> +* x-igd-lpc=[on|*off*]
> +  Creates a dummy LPC bridge at 00:1f:0 with host VID/DID (i440fx only)
> +
> +* x-igd-legacy-mode=[on|off|*auto*]
> +  Enable/Disable legacy mode
> +
> +* x-igd-gms=[hex, default 0]
> +  Overriding DSM region size in GGC register, 0 means uses host value.
> +  Use this only when the DSM size cannot be changed through the
> +  'DVMT Pre-Allocated' option in host BIOS.
>  
> -For UPT mode, retain the default emulated graphics and simply add the 
> vfio-pci
> -device making use of any other bus address other than 02.0.  libvirt will
> -default to assigning the device a UPT compatible address while legacy mode
> -users will need to manually edit the XML if using a tool like virt-manager
> -where the VM device address is not expressly specified.
>  
> -An experimental vfio-pci option also exists to enable OpRegion, and thus
> -external monitor support, for UPT mode.  This can be enabled by adding
> -"x-igd-opregion=on" to the vfio-pci device options for the IGD device.  As
> -with legacy mode, this requires the host to support features introduced in
> -the v4.6 kernel.  If Intel chooses to embrace this support, the option may
> -be made non-experimental in the future, opening it to libvirt support.
> +Examples
> +========
> +* Adding IGD with automatically legacy mode support
> +  -device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0
>  
> -Developer ABI
> -=============
> -Legacy mode IGD support imposes two fw_cfg requirements on the VM firmware:
> +* Adding IGD with OpRegion and LPC ID hack, but without VGA ranges
> +  (For UEFI guests)
> +  -device 
> vfio-pci,host=00:02.0,id=hostdev0,addr=2.0,x-igd-legacy-mode=off,x-igd-opregion=on,x-igd-lpc=on,romfile=efi_oprom.rom
> +
> +
> +Guest firmware
> +==============
> +Guest firmware is responsible for setting up OpRegion and Base of Data Stolen
> +Memory (BDSM) in guest address space. IGD passthrough support imposes two
> +fw_cfg requirements on the VM firmware:
>  
>  1) "etc/igd-opregion"
>  
> @@ -117,17 +150,108 @@ Legacy mode IGD support imposes two fw_cfg 
> requirements on the VM firmware:
>     Firmware must allocate a reserved memory below 4GB with required 1MB
>     alignment equal to this size.  Additionally the base address of this
>     reserved region must be written to the dword BDSM register in PCI config
> -   space of the IGD device at offset 0x5C.  As this support is related to
> -   running the IGD ROM, which has other dependencies on the device appearing
> -   at guest address 00:02.0, it's expected that this fw_cfg file is only
> -   relevant to a single PCI class VGA device with Intel vendor ID, appearing
> -   at PCI bus address 00:02.0.
> +   space of the IGD device at offset 0x5C (or 0xC0 for Gen 11+ devices using
> +   64-bit BDSM).  As this support is related to running the IGD ROM, which
> +   has other dependencies on the device appearing at guest address 00:02.0,
> +   it's expected that this fw_cfg file is only relevant to a single PCI
> +   class VGA device with Intel vendor ID, appearing at PCI bus address 
> 00:02.0.
> +
> +Upstream Seabios has OpRegion and BDSM (pre-Gen11 device only) support.
> +However, the support is not accepted by upstream EDK2/OVMF. A recommended
> +solution is to create a virtual OpRom with following DXE drivers:
> +
> +* IgdAssignmentDxe: Set up OpRegion and BDSM according to fw_cfg (must)
> +* IntelGopDriver: Closed-source Intel GOP driver
> +* PlatformGopPolicy: Protocol required by IntelGopDriver
> +
> +IntelGopDriver and PlatformGopPolicy is only required when enabling GOP on 
> IGD.
> +
> +The original IgdAssignmentDxe can be found at [3]. A Intel maintained version
> +with PlatformGopPolicy for industrial computing is at [4]. There is also an
> +unofficially maintained version with newer Gen11+ device support at [5].
> +You need to build them with EDK2.
> +
> +For the IntelGopDriver, Intel never released it to public. You may contact
> +Intel support to get one as [4] said, if you are an Intel primer customer,

s/primer/premier/ ?

> +or you can try extract it from your host firmware using "UEFI BIOS 
> Updater"[6].
> +
> +Once you got all the required DXE drivers, a Option ROM can be generated with
> +EfiRom utility in EDK2, using
> +  EfiRom -f 0x8086 -i <Device ID of your IGD> -o output.rom \
> +  -e IgdAssignmentDxe.efi PlatformGOPPolicy.efi IntelGopDriver.efi
> +
> +
> +Known issues
> +============
> +When using OVMF as guest firmware, you may encounter the following warning:
> +warning: vfio_container_dma_map(0x55fab36ce610, 0x380010000000, 0x108000, 
> 0x7fd336000000) = -22 (Invalid argument)
> +Solution:
> +Set the host physical address bits to IOMMU address width using
> +  -cpu host,host-phys-bits-limit=<IOMMU address width>
> +Or in libvirt XML with
> +  <cpu>
> +    <maxphysaddr mode='passthrough' limit='<IOMMU address width>'/>
> +  </cpu>
> +The IOMMU address width can be determined with
> +echo $(( ((0x$(cat /sys/devices/virtual/iommu/dmar0/intel-iommu/cap) & 
> 0x3F0000) >> 16) + 1 ))

That's handy!

> +Refer https://edk2.groups.io/g/devel/topic/patch_v1/102359124 for more 
> details
> +
> +
> +Memory View
> +===========
> +IGD has it own address space. To use system RAM as VRAM, a single-level page
> +table named Graphics Translation Table (GTT) is used for the address
> +translation. Each page table entry points a 4KB page. The translation flow 
> is:
> +
> +(PTE size 8)             +-------------+---+
> +                         |   Address   | V |  V: Valid Bit
> +                         +-------------+---+
> +                         | ...         |   |
> +IGD:0x01ae9010     0xd740| 0x70ffc000  | 1 |  Mem:0x42ba3e010^
> +-----------------> 0xd748| 0x42ba3e000 | 1 +------------------>
> +(addr << 12) * 8   0xd750| 0x42ba3f000 | 1 |
> +                         | ...         |   |
> +                         +-------------+---+

I think this was meant to be '(addr >> 12) * 8'.  A simpler
representation is just (addr >> 9), but maybe you're trying to
emphasize the PTE size here.

> +^ The address may be remapped by IOMMU
> +
> +The memory region store GTT is called GTT Stolen Memory (GSM), it is located
> +right below the Data Stolen Memory (DSM). Accessing this region directly is
> +not allowed, any access will immediately freeze the whole system. The only 
> way
> +to access it is through the second half of MMIO BAR0.
> +
> +The Data Stolen Memory is reserved by firmware, and acts as the VRAM in 
> pre-OS
> +environments. In QEMU, guest firmware (Seabios/OVMF) is responsible for
> +reserving a continuous region and program its base address to BDSM register,
> +then let VBIOS/GOP driver initializing this region. Illustration below shows
> +how DSM is mapped.
> +
> +       IGD Addr Space                 Host Addr Space         Guest Addr 
> Space
> +       +-------------+                +-------------+         +-------------+
> +       |             |                |             |         |             |
> +       |             |                |             |         |             |
> +       |             |                +-------------+         +-------------+
> +       |             |                | Data Stolen |         | Data Stolen |
> +       |             |                |   (Guest)   |         |   (Guest)   |
> +       |             |  
> +------------>+-------------+<------->+-------------+<--Guest BDSM
> +       |             |  | Passthrough |             | EPT     |             
> |   Emulated by QEMU
> +DSMSIZE+-------------+  | with IOMMU  |             | Mapping |             
> |   Programmed by guest FW
> +       |             |  |             |             |         |             |
> +       |             |  |             |             |         |             |
> +      0+-------------+--+             |             |         |             |
> +                        |             +-------------+         |             |
> +                        |             | Data Stolen |         +-------------+
> +                        |             |   (Host)    |
> +                        +------------>+-------------+<--Host BDSM
> +                          Non-        |             |   "real" one in HW
> +                          Passthrough |             |   Programmed by host FW
> +                                      +-------------+
>  
>  Footnotes
>  =========
> -[1] Nothing precludes adding additional emulated or assigned graphics devices
> -    as non-primary, other than the combination typically not working.  I only
> -    intend to set user expectations, others are welcome to find working
> -    combinations or fix whatever issues prevent this from working in the 
> common
> -    case.
> +[1] 
> https://www.intel.com/content/www/us/en/docs/graphics-for-linux/developer-reference/1-0/dump-video-bios.html
>  [2] # echo "vfio-pci" > /sys/bus/pci/devices/0000:00:02.0/driver_override
> +[3] 
> https://web.archive.org/web/20240827012422/https://bugzilla.tianocore.org/show_bug.cgi?id=935
> +    Tianocore bugzilla was down since Jan 2025 :(
> +[4] https://eci.intel.com/docs/3.3/components/kvm-hypervisor.html, Patch 
> 0001-0004
> +[5] https://github.com/tomitamoeko/VfioIgdPkg
> +[6] 
> https://winraid.level1techs.com/t/tool-guide-news-uefi-bios-updater-ubu/30357

This is great and a much needed update.  Thanks!

With above corrections:

Reviewed-by: Alex Williamson <alex.william...@redhat.com>


Reply via email to