Hi,

I'm trying to get a better understanding of how libvirt VMs interact with the 
default QEMU setting for pci-hole64-size on q35 hosts, to assess why my libvirt 
VMs behave differently from a similarly configured lxd VM. As I understand, 
both libvirt and lxd are using qemu q35 VMs under the hood, and both are 
inheriting their pci-hole64-size from qemu's default setting (correct me if 
that's wrong), but in my tests, I'm getting different behavior from them. I 
know lxd is probably out of scope from the libvirt project perspective, so 
consider this more of a libvirt question w/ some added lxd context.

All of this is on a DGX B200 host, which contains large (~180GB VRAM) GPUs.

With libvirt/virt-install, I created a q35 virtual machine with CPU host 
passthrough and 1 or more GPUs passed-through via --host-device. Without 
additional modifications, this works as expected, and I can initialize the GPU 
driver in the VM and run nvidia-smi.

With lxd (which creates a q35 virtual machine with CPU host passthrough by 
default), I attached 1 GPU via "lxc config device add passthroughtest gpu gpu 
pci=1b:00.0". On that machine, the pci-hole64-size is too small by default, 
since I see these in my dmesg:
[    1.099110] pci 0000:00:01.5: bridge window [mem size 0x6000000000 64bit 
pref]: can't assign; no space
[    1.120274] pci 0000:00:01.5: bridge window [mem size 0x6000000000 64bit 
pref]: can't assign; no space
[    1.183281] pci 0000:06:00.0: BAR 2 [mem size 0x4000000000 64bit pref]: 
can't assign; no space
[    1.186320] pci 0000:06:00.0: BAR 0 [mem size 0x04000000 64bit pref]: can't 
assign; no space
[    1.189340] pci 0000:06:00.0: BAR 4 [mem size 0x02000000 64bit pref]: can't 
assign; no space

and I cannot initialize the GPU driver since the BARs weren't mapped correctly.

When I apply a larger hole size to my lxd VM via `lxc config set 
passthroughtest raw.qemu=' -global q35-pcihost.pci-hole64-size=8192G'`, I don't 
see any "can't assign; no space" messages, and the driver works as expected.

My question about libvirt is - where (if at all) does libvirt interact with 
qemu's pci-hole64-size value? If libvirt does not automatically do something 
functionally similar to changing the hole size like I need to do above for lxd, 
and is in fact just using a qemu default value, is there some other related 
interaction happening in libvirt that might explain why my libvirt VMs don't 
require a manual change to pci-hole64-size, despite the fact that the relevant 
parts of the underlying qemu machine should be the same?

Reply via email to