Hi, When bypass iommu is used together with split irqchip, during boot, kernel would dump 2 callstacks(see attached) and the results are pci devices attached to root bus will fall back to using IOAPIC instead of MSIx. This problem was initally noticed by Juro.
This only happens with kernel-irqchip=split since kernel-irqchip=on will implicitely disable interrupt remap for the virtual IOMMU. Per my understanding, default_bus_bypass_iommu=true will make pci devices under the root bus disappear from the DRHD table. But when irq remapping is enabled, kernel expects all devices appear in some DRHD table or the device's irq domain will become NULL and that would make the device's MSI setup fail and the device's irq functionality will fall back to using IOAPIC. This doesn't look good, since in split mode, IOAPIC is implemented in user space and can be bad for performance. I don't see any immediate solution to this problem, except adding intremap=off explicitely to the iommu device. Any ideas? Should we enhance the document on bypass iommu by adding that intremap should be disabled or there is perhaps other way to fix the issue? Thanks. The qemu cmdline is like this: $QEMU -m 4096 -smp 4 \ -vga none \ -drive file=$VM_GUEST,if=none,id=mydrive0 \ -device virtio-blk-pci,drive=mydrive0 \ -cpu host \ -machine q35,accel=kvm,kernel-irqchip=split,default_bus_bypass_iommu=true \ -device virtio-net-pci,netdev=mynet0 \ -netdev user,id=mynet0,hostfwd=tcp::2022-:22 \ -device intel-iommu \ -nographic \ -kernel $BZIMAGE \ -append "root=/dev/vda2 console=ttyS0 no_hash_pointers" Full dmesg attached.
dmesg.gz
Description: application/gzip