Reassigning to @dannf since the next step is to pull this into Debian, then Plucky
** Description changed: + SRU Justification: + + [ Impact ] + Due to an inefficiency in the way older host kernels manage pfnmaps for guest VM memory ranges[0], guests with large-BAR GPUs passed through have a very long (multiple minutes) initialization time when the MMIO window advertised by OVMF is sufficiently sized for the passed-through BARs (i.e., the correct OVMF behavior). However, in the past, users have benefited from fast guest boot times when OVMF advertised an MMIO window that was too small to accommodate the full BAR, since this resulted in the long PCI initialization process being skipped (and retried later in a way that omitted the slow path, if pci=realloc pci=nocrs were set). While the root cause is being fully addressed in the upstream kernel[1], the solution relies on huge pfnmap support, which is not expected to be backported into the 6.8 or 6.11 -generic kernels. As a result, the only kernel improvement supported on those kernels is this patch[2], which reduces the extra boot time by about half. Unfortunately, that boot time is still an average of 1-3 minutes longer per-VM-boot than what can be achieved when the host is running a version of OVMF without PlatformDynamicMmioWindow (PDMW) support (introduced in [3]) (as was the case in Jammy's version of OVMF). - Since there is no way to force the use of the classic MMIO window - size[4] in any version of OVMF after [3], and since we have a use case - for such functionality on Noble that would yield significant, recurring - compute time savings across all impacted VMs, I propose adding a boolean - fw_cfg knob "opt/ovmf/X-PciMmioClassicWindow", which, when TRUE, would - bypass the PDMW functionality in [3] and instead use the "classic" MMIO - window. When set, users' guest VMs would need to set the `pci=nocrs - pci=realloc` kernel options for passed-through GPUs to be usable, but - boots would be short again, as they were in Jammy. - I am *not* planning to submit this patch to upstream edk2 since this - option should not be necessary for any users running newer kernels, once - the requisite patches[1] land in the mainline tree. + [ Test Plan ] + + I have confirmed that this cleanly applies to the latest noble OVMF and + prepared a test PPA: + https://launchpad.net/~mitchellaugustin/+archive/ubuntu/edk2-honor-user- + mmio-window + + I have verified that this knob works as expected for values large enough for the GPU MMIO windows (as supported by the original behavior) and for values smaller than what PDMW computes (newly introduced by this patch). + On DGX H100, forcing a value of 1024 (lower than required for passed-through GPUs) results in desired fast boot time, with GPUs still being usable as long as pci=nocrs pci=realloc are set in the guest, even on legacy kernels. I also observed no regressions, and no change in behavior when X-PciMmio64Mb is absent or above the PDMW-calculated value. + + [ Fix ] + + Since there is no way to force the use of the classic MMIO window size4 + in any version of OVMF after 3, and since we have a use case for such + functionality on legacy distro kernels that would yield significant, + recurring compute time savings across all impacted VMs, apply this + change to this knob's behavior to make this workaround possible on + Noble. + + [ Where problems could occur ] + + If there are user deployments on Noble in which X-PciMmio64Mb is + currently explicitly set to a value smaller than the PDMW-computed + value, those deployments are currently ignoring the X-PciMmio64Mb value + and instead using that which is calculated by PDMW. If any such + deployments exist, *and* are specifying values that are too small for + their GPUs' MMIO windows, *and* do not have `pci=realloc pci=nocrs` set, + their passed-through GPUs will stop working until they either raise + X-PciMmio64Mb to be large enough for their MMIO windows, remove + X-PciMmio64Mb from their config (if PDMW's value is high enough), or add + `pci=nocrs pci=realloc` to their guest kernel config to obtain the + benefits of this patch. + + However, from the perspective of OVMF, we are making the X-PciMmio64Mb + behavior more consistent, so I do not believe the above risk should be a + blocker for including this patch. (I also suspect that those + circumstances are uncommon, since anyone impacted by their use of + X-PciMmio64Mb today must only be specifying values larger than PDMW, who + will not be impacted by this.) + + Additionally, this patch only adds new opt-in functionality and does not + impact anyone not using X-PciMmio64Mb, so it shouldn't have much + regression risk outside of that. [0]: https://lore.kernel.org/all/cahta-uyp07fgm6t1ozqkqadsa5jrzo0reneyzgqzub4mdrr...@mail.gmail.com/ [1]: https://lore.kernel.org/all/20250205231728.2527186-1-alex.william...@redhat.com/ [2]: https://lore.kernel.org/all/20250111210652.402845-1-alex.william...@redhat.com/ [3]: https://github.com/tianocore/edk2/commit/ecb778d0ac62560aa172786ba19521f27bc3f650 [4]: https://edk2.groups.io/g/devel/topic/109651206?p=Created,,,20,1,0,0 ** Summary changed: - UBUNTU: SAUCE: Introduce X-PciMmioClassicWindow option to fw_cfg + Backport "OvmfPkg: Use user-specified opt/ovmf/X-PciMmio64Mb value unconditionally" to Noble ** Description changed: + Upstream patch: https://github.com/tianocore/edk2/pull/10856/commits/f8a8bb717c53c651750025aefaa5654f383bd02e + (To be added to Plucky via Debian) + SRU Justification: [ Impact ] Due to an inefficiency in the way older host kernels manage pfnmaps for guest VM memory ranges[0], guests with large-BAR GPUs passed through have a very long (multiple minutes) initialization time when the MMIO window advertised by OVMF is sufficiently sized for the passed-through BARs (i.e., the correct OVMF behavior). However, in the past, users have benefited from fast guest boot times when OVMF advertised an MMIO window that was too small to accommodate the full BAR, since this resulted in the long PCI initialization process being skipped (and retried later in a way that omitted the slow path, if pci=realloc pci=nocrs were set). While the root cause is being fully addressed in the upstream kernel[1], the solution relies on huge pfnmap support, which is not expected to be backported into the 6.8 or 6.11 -generic kernels. As a result, the only kernel improvement supported on those kernels is this patch[2], which reduces the extra boot time by about half. Unfortunately, that boot time is still an average of 1-3 minutes longer per-VM-boot than what can be achieved when the host is running a version of OVMF without PlatformDynamicMmioWindow (PDMW) support (introduced in [3]) (as was the case in Jammy's version of OVMF). - [ Test Plan ] I have confirmed that this cleanly applies to the latest noble OVMF and prepared a test PPA: https://launchpad.net/~mitchellaugustin/+archive/ubuntu/edk2-honor-user- mmio-window I have verified that this knob works as expected for values large enough for the GPU MMIO windows (as supported by the original behavior) and for values smaller than what PDMW computes (newly introduced by this patch). On DGX H100, forcing a value of 1024 (lower than required for passed-through GPUs) results in desired fast boot time, with GPUs still being usable as long as pci=nocrs pci=realloc are set in the guest, even on legacy kernels. I also observed no regressions, and no change in behavior when X-PciMmio64Mb is absent or above the PDMW-calculated value. [ Fix ] Since there is no way to force the use of the classic MMIO window size4 in any version of OVMF after 3, and since we have a use case for such functionality on legacy distro kernels that would yield significant, recurring compute time savings across all impacted VMs, apply this change to this knob's behavior to make this workaround possible on Noble. [ Where problems could occur ] If there are user deployments on Noble in which X-PciMmio64Mb is currently explicitly set to a value smaller than the PDMW-computed value, those deployments are currently ignoring the X-PciMmio64Mb value and instead using that which is calculated by PDMW. If any such deployments exist, *and* are specifying values that are too small for their GPUs' MMIO windows, *and* do not have `pci=realloc pci=nocrs` set, their passed-through GPUs will stop working until they either raise X-PciMmio64Mb to be large enough for their MMIO windows, remove X-PciMmio64Mb from their config (if PDMW's value is high enough), or add `pci=nocrs pci=realloc` to their guest kernel config to obtain the benefits of this patch. However, from the perspective of OVMF, we are making the X-PciMmio64Mb behavior more consistent, so I do not believe the above risk should be a blocker for including this patch. (I also suspect that those circumstances are uncommon, since anyone impacted by their use of X-PciMmio64Mb today must only be specifying values larger than PDMW, who will not be impacted by this.) Additionally, this patch only adds new opt-in functionality and does not impact anyone not using X-PciMmio64Mb, so it shouldn't have much regression risk outside of that. [0]: https://lore.kernel.org/all/cahta-uyp07fgm6t1ozqkqadsa5jrzo0reneyzgqzub4mdrr...@mail.gmail.com/ [1]: https://lore.kernel.org/all/20250205231728.2527186-1-alex.william...@redhat.com/ [2]: https://lore.kernel.org/all/20250111210652.402845-1-alex.william...@redhat.com/ [3]: https://github.com/tianocore/edk2/commit/ecb778d0ac62560aa172786ba19521f27bc3f650 [4]: https://edk2.groups.io/g/devel/topic/109651206?p=Created,,,20,1,0,0 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2101903 Title: Backport "OvmfPkg: Use user-specified opt/ovmf/X-PciMmio64Mb value unconditionally" to Noble To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/2101903/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs