(cc Gerd)

On Mon, 18 Nov 2024 at 20:26, mitchell.augustin via groups.io
<mitchell.augustin=canonical....@groups.io> wrote:
>
> Hello!
> We've identified an issue with OVMF that causes the boot time of VMs to be 
> considerably slower (usually taking 10+ minutes more) under (at least) the 
> following conditions:
>  * CPU passthrough is used
>  * Host has phys-bits > 40
>  * GPU PCI passthrough is used
> This slowdown was not present before commit 
> https://github.com/tianocore/edk2/commit/ecb778d0ac62560aa172786ba19521f27bc3f650
>  and is still present in the latest upstream edk2 version. Without that 
> patch, we are only able to utilize passed-through Nvidia GPUs when the kernel 
> options `pci=nocrs pci=realloc` are set in the guest. With the patch, we no 
> longer need those kernel opts in the guest, but PCI enumeration and BAR 
> assignment of the passed-through GPUs (and some other boot steps that may or 
> may not be related) proceed extremely slowly.
> I tested the following virt-install command on our DGX H100, running upstream 
> OVMF @ 
> https://github.com/tianocore/edk2/commit/ef4f3aa3f7e3c28c7f0e1a3c35711f1a85becd71
>  built with verbose debug output enabled to identify areas where the boot 
> process appeared to be spending the most time. I have attached the full logs 
> from that VM (hidon-slow-ovmf-verbose.txt) as well as a "human view" of what 
> that process looked like to me, since I did not have accurate wall-clock 
> timestamps in the console output (h100-verbose-vm-logs.txt).
>
> I also confirmed that this same issue is present under the same conditions as 
> above on our DGX Station A100 when using a slightly different VM config 
> (which I can provide if necessary), so it likely affects any host with enough 
> physbits, when the CPU is passed through.
>
> Full lscpu output for DGX H100 is attached (as lscpu-h100.txt). In the guest 
> VM, the address sizes were the same when CPU passthrough was used.
> For the A100 station, I logged Address sizes: 43 bits physical, 48 bits 
> virtual from lscpu (I can get the rest of the lscpu output as well if it 
> would be relevant). Strangely though, despite CPU passthrough being enabled 
> there as well, the guest saw Address sizes: 48 bits physical, 48 bits virtual.
>
> Please let me know if there is any clarification or other information I can 
> provide that could help you debug this issue. Thanks,
> Mitchell Augustin
>
> Attachments:
>
> hidon-slow-ovmf-verbose.txt
> h100-verbose-vm-logs.txt
> lscpu-h100.txt
>
> 


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#120795): https://edk2.groups.io/g/devel/message/120795
Mute This Topic: https://groups.io/mt/109651206/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to