Hi,

When bypass iommu is used together with split irqchip, during boot,
kernel would dump 2 callstacks(see attached) and the results are pci
devices attached to root bus will fall back to using IOAPIC instead of
MSIx. This problem was initally noticed by Juro.

This only happens with kernel-irqchip=split since kernel-irqchip=on will
implicitely disable interrupt remap for the virtual IOMMU.

Per my understanding, default_bus_bypass_iommu=true will make pci devices
under the root bus disappear from the DRHD table. But when irq remapping
is enabled, kernel expects all devices appear in some DRHD table or the
device's irq domain will become NULL and that would make the device's MSI
setup fail and the device's irq functionality will fall back to using
IOAPIC. This doesn't look good, since in split mode, IOAPIC is implemented
in user space and can be bad for performance.

I don't see any immediate solution to this problem, except adding
intremap=off explicitely to the iommu device.

Any ideas? Should we enhance the document on bypass iommu by adding that
intremap should be disabled or there is perhaps other way to fix the
issue? Thanks.

The qemu cmdline is like this:
$QEMU -m 4096 -smp 4 \
  -vga none \
  -drive file=$VM_GUEST,if=none,id=mydrive0 \
  -device virtio-blk-pci,drive=mydrive0 \
  -cpu host \
  -machine q35,accel=kvm,kernel-irqchip=split,default_bus_bypass_iommu=true \
  -device virtio-net-pci,netdev=mynet0 \
  -netdev user,id=mynet0,hostfwd=tcp::2022-:22 \
  -device intel-iommu \
  -nographic \
  -kernel $BZIMAGE \
  -append "root=/dev/vda2 console=ttyS0 no_hash_pointers"

Full dmesg attached.

Attachment: dmesg.gz
Description: application/gzip

Reply via email to