(cc Gerd) On Mon, 18 Nov 2024 at 20:26, mitchell.augustin via groups.io <mitchell.augustin=canonical....@groups.io> wrote: > > Hello! > We've identified an issue with OVMF that causes the boot time of VMs to be > considerably slower (usually taking 10+ minutes more) under (at least) the > following conditions: > * CPU passthrough is used > * Host has phys-bits > 40 > * GPU PCI passthrough is used > This slowdown was not present before commit > https://github.com/tianocore/edk2/commit/ecb778d0ac62560aa172786ba19521f27bc3f650 > and is still present in the latest upstream edk2 version. Without that > patch, we are only able to utilize passed-through Nvidia GPUs when the kernel > options `pci=nocrs pci=realloc` are set in the guest. With the patch, we no > longer need those kernel opts in the guest, but PCI enumeration and BAR > assignment of the passed-through GPUs (and some other boot steps that may or > may not be related) proceed extremely slowly. > I tested the following virt-install command on our DGX H100, running upstream > OVMF @ > https://github.com/tianocore/edk2/commit/ef4f3aa3f7e3c28c7f0e1a3c35711f1a85becd71 > built with verbose debug output enabled to identify areas where the boot > process appeared to be spending the most time. I have attached the full logs > from that VM (hidon-slow-ovmf-verbose.txt) as well as a "human view" of what > that process looked like to me, since I did not have accurate wall-clock > timestamps in the console output (h100-verbose-vm-logs.txt). > > I also confirmed that this same issue is present under the same conditions as > above on our DGX Station A100 when using a slightly different VM config > (which I can provide if necessary), so it likely affects any host with enough > physbits, when the CPU is passed through. > > Full lscpu output for DGX H100 is attached (as lscpu-h100.txt). In the guest > VM, the address sizes were the same when CPU passthrough was used. > For the A100 station, I logged Address sizes: 43 bits physical, 48 bits > virtual from lscpu (I can get the rest of the lscpu output as well if it > would be relevant). Strangely though, despite CPU passthrough being enabled > there as well, the guest saw Address sizes: 48 bits physical, 48 bits virtual. > > Please let me know if there is any clarification or other information I can > provide that could help you debug this issue. Thanks, > Mitchell Augustin > > Attachments: > > hidon-slow-ovmf-verbose.txt > h100-verbose-vm-logs.txt > lscpu-h100.txt > >
-=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#120795): https://edk2.groups.io/g/devel/message/120795 Mute This Topic: https://groups.io/mt/109651206/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-