> the outcome quoted below is awesome Thanks @Andreas! :)
> Some parameter was considered a lower limit before, and now it's no longer a lower limit. Correct. Without my patch, if a user sets opt/ovmf/X-PciMmio64Mb, that value is only used if it is *higher* than the value that PlatformDynamicMmioWindow computes - so by specifying it, you tell OVMF that the MMIO window size should be "No lower than opt/ovmf/X-PciMmio64Mb". With my patch, by specifying opt/ovmf/X-PciMmio64Mb, you tell OVMF that the MMIO window size should be "exactly opt/ovmf/X-PciMmio64Mb". > Is it documented and known that prior to this change opt/ovmf/X-PciMmio64Mb is a lower limit? Not really. This was something that I only learned as part of my discussion with edk2 upstream, when the maintainer clarified it for me [0]. AFAIK this isn't outlined anywhere in the edk2 docs explicitly. > Would users be surprised (negatively) that now it's no longer a limit of any kind? I doubt it. If any users are specifying a lower value for opt/ovmf/X-PciMmio64Mb than what PlatformDynamicMmioWindow calculates, their opt/ovmf/X-PciMmio64Mb value is simply being ignored entirely today. I suppose there could be some configuration out there where someone specified a lower value which was ignored, then never removed it when it had no effect - and that user could be impacted after this update if the value is lower than what their GPU requires, but aside from "I forgot to remove this dead code", there should not be any actual users of such a construct on Noble. Really, the common use case for opt/ovmf/X-PciMmio64Mb up until now has been to make the window size *larger* than what PDMW computes - and that behavior is not being impacted by this patch. > I can now set "the aperture" to a value lower than opt/ovmf/X-PciMmio64Mb Not quite - this patch enables you to set opt/ovmf/X-PciMmio64Mb to a value lower than *the MMIO window size computed by PlatformDynamicMmioWindow*, and actually have the opt/ovmf/X-PciMmio64Mb value be honored. (in other words, opt/ovmf/X-PciMmio64Mb now *is* the aperature size, rather than just a the minimum allowed aperture size that may be undermined by PDMW.) Without my patch, an aperture size setting in opt/ovmf/X-PciMmio64Mb which is lower than the aperture size calculation from PDMW is simply ignored. > From what I understood of the patch description, this makes the "long PCI initialization process be skipped". what are the consequences of that skip? The long PCI init process is only skipped if the value you specify for opt/ovmf/X-PciMmio64Mb is lower than the GPU's actual MMIO window size. (so, for example, if you have a 128GB VRAM GPU, and you set opt/ovmf/X-PciMmio64Mb to 1GB, that GPU's PCI init steps are skipped.) However, if your GPU only has 32GB VRAM, and you set opt/ovmf/X-PciMmio64Mb to 64GB, there should be no change in behavior (thus no init skip). On its own, the skip means that the MMIO regions for the impacted devices are not usable (since they were not able to be mapped.) The reason we need pci=nocrs is to tell the kernel to disregard the mappings reported by OVMF, and pci=realloc is to have the kernel compute its own mappings. (As I understand). > isn't there a less intrusive way to have that skip happen, instead of setting a lower aperture size? Sounds like we are just taking advantage of a side effect of setting the aperture to a value lower than the previous limit. Why not, let's say, a new option called "pci_initialization_skip=1"? Because that's a kernel change? Theoretically, a kernel change could enable this, but I would be hesitant to pursue it for a few reasons: 1) The method I'm proposing enables Noble users to opt into behavior that is essentially functionally equivalent to how things work in Jammy OVMF today, and from the perspective of the Nvidia DGX team, we already have years of Nvidia testing coverage on Jammy that show that `pci=realloc pci=nocrs` with a lower aperture size than the DGX GPUs support is sufficient for their customers and doesn't cause any known issues. A novel kernel patch would not benefit from this history, and I doubt it would be less intrusive from a code PoV than my proposed solution. 2) IMO, even outside of its usefulness for our workaround, the behavior of opt/ovmf/X-PciMmio64Mb should have always honored any specified value; the fact that it doesn't is arguably a bug in itself which is not documented clearly anywhere. 3) Your proposed kernel approach would impact *any* PCI initialization, as I understand from what you described, which I believe could have a larger 'blast radius' when set than my OVMF approach, which would only impact devices with BARs larger than the memory size the user explicitly sets. More generally, my concern is that this kind of kernel approach would end up being significant additional development and testing effort to get to a point that is ultimately no less of a 'workaround' than this OVMF approach. (In an ideal world, we would be able to actually backport the kernel series that fixes the underlying issue "for real" without any workarounds [3], but I have discussed this with the kernel team already, and they do not think it would be a reasonable SRU request for 6.8 since it would also require backporting huge pfnmap support as a prerequisite, which contains substantial ABI changes.) > In other words, isn't this change just kicking the can down the road, and making something else slow now, something that used to be fast (like post-boot operations) We have not observed any post-boot slowness or instability with this workaround. The upstream pci maintainer and I think [4] this is because decode is already disabled once we reach the decode disable step in drivers/pci/probe.c [1] in scenarios where the GPU is being configured post-boot, which allows the slow parts (the pci_write_config_word(dev, PCI_COMMAND, orig_cmd & ~PCI_COMMAND_DECODE_ENABLE) call and its corresponding re-enable later in the function) to be skipped entirely. > kernel command-line: "pci=realloc pci=nocrs". > c1) Are the mentioned options default? No, and they probably shouldn't be defaults, since they really would only be needed within guest VM kernels in this very specific type of configuration. > c2) Are users expected to be using those options with this kind of hardware? Setting `pci=nocrs pci=realloc` is the approach we have been recommending to Nvidia and their customers on Jammy for a few years [2], since Jammy requires non-default options to get GPU passthrough working at all. (On Jammy, users can either set `pci=nocrs pci=realloc` in the guest, *OR* set opt/ovmf/X-PciMmio64Mb to a value large enough for their GPUs - but the latter is not the recommended approach since it produces the same slow boot as we are trying to avoid with this workaround.) > c3.2) If the host is updated with this new ed2k, running guests would only be impacted if they reboot, I presume? Correct > d) From what I understood, impacted users would still have to make changes to their systems in order to take advantage of this update, right? Is that change "obvious"? Which value should they set their aperture size to? How can they find that out? Correct. Impacted users will need to do the following to see any change in behavior: 1. Install the updated OVMF on their host 2. Set `pci=realloc pci=nocrs` in their guest kernel, then `sudo update-grub` 3. Set opt/ovmf/X-PciMmio64Mb in their per-guest QEMU or libvirt configuration to any arbitrary value *lower* than the amount of VRAM their passed-through GPUs have, then restart the guest VM Historically, it seems like Nvidia has not had any issues deploying changes of this nature internally, and they have not expressed any concerns about getting their customers on board with this approach since I have proposed it to them. (It'd be similar to what they had to do to get Jammy passhthrough working on these platforms initially, from their POV.) Let me know if you have additional questions. [0]: https://edk2.groups.io/g/devel/message/120831 [1]: https://web.git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/pci/probe.c?h=v6.12.1#n189 [2]: https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563/comments/10 [3]: https://lore.kernel.org/all/20250205231728.2527186-1-alex.william...@redhat.com/ [4]: https://lore.kernel.org/all/20241203163045.3e068562.alex.william...@redhat.com/ -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2101903 Title: Backport "OvmfPkg: Use user-specified opt/ovmf/X-PciMmio64Mb value unconditionally" to Noble To manage notifications about this bug go to: https://bugs.launchpad.net/edk2/+bug/2101903/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs