On Wed, 6 Nov 2019 00:29:52 +0100 Samuel Ortiz <samuel.or...@intel.com> wrote:
> On Tue, Nov 05, 2019 at 01:21:48PM -0700, Alex Williamson wrote: > > On Fri, 18 Oct 2019 05:48:49 +0000 > > "Boeuf, Sebastien" <sebastien.bo...@intel.com> wrote: > > > > > Hi folks, > > > > > > I have been recently working with VFIO, and particularly trying to > > > achieve device passthrough through multiple layers of virtualization. > > > > > > I wanted to assess QEMU's performances with nested VFIO, using the > > > emulated Intel IOMMU device. Unfortunately, I cannot make any of my > > > physical device work when I pass them through, attached to the emulated > > > Intel IOMMU. Using regular VFIO works properly, but as soon as I enable > > > the virtual IOMMU, the driver fails to probe (I tried on two different > > > machines with different types of NIC). > > > > > > So I was wondering if someone was aware of any issue with using both > > > VFIO and virtual Intel IOMMU with QEMU? I'm sure I might be missing > > > something obvious but I couldn't find it so far. > > > > It's not something I test regularly, but I'm under the impression that > > nested device assignment does work. When you say the driver fails to > > probe, which driver is that, the endpoint driver in the L2 guest or > > vfio-pci in the L1 guest? Perhaps share your XML or command line? > > This is fixed now. Apparently the iommu device needs to be passed > _before_ the other devices on the command line. We managed to make it > work as expected. Good news! > Sebastien and Yi Liu figured this out but for some reasons the > thread moved to vfio-users-boun...@redhat.com. Yes, I see some uncaught bounce notifications, it looks like Yi's initial reply was to vfio-users-bounces. Yi, you might want to checkout your mailer configuration. For posterity/follow-up, I'll paste the final message from the bounce notification below. Thanks, Alex On Mon, 28 Oct 2019 08:13:23 +0000 "Liu, Yi L" <yi.l....@intel.com> wrote: > Hi Sebastien, > > That’s great it works for you. I remember there was an effort > to fix it in community. But I cannot recall if it was documented. > If not, I think I can co-work with community to make it clear. > > Regards, > Yi Liu > > From: Boeuf, Sebastien > Sent: Friday, October 25, 2019 7:17 PM > To: Liu, Yi L <yi.l....@intel.com> > Cc: Ortiz, Samuel <samuel.or...@intel.com>; vfio-users-boun...@redhat.com; > Bradford, Robert <robert.bradf...@intel.com> > Subject: Re: [vfio-users] Nested VFIO with QEMU > > Hi Yi Liu, > > Yes that was it :) > Thank you very much for your help! > > Is it documented somewhere that parameters order matters? > > Thanks, > Sebastien > > On Fri, 2019-10-25 at 09:52 +0800, Liu, Yi L wrote: > Hi Sebastien, > > I guess the cmdline is cause. You should put the intel-iommu exposure prior > to other devices as below. > > -drive if=none,id=drive0,format=raw,file=/home/sebastien/clear-kvm.img \ > -device intel-iommu,intremap=on,caching-mode=on > -device virtio-blk-pci,drive=drive0,scsi=off \ > -device virtio-rng-pci \ > -device vfio-pci,host=00:19.0 \ > > Regards, > Yi Liu > > From: Boeuf, Sebastien > Sent: Friday, October 25, 2019 7:14 AM > To: Liu, Yi L <yi.l....@intel.com<mailto:yi.l....@intel.com>> > Cc: Ortiz, Samuel <samuel.or...@intel.com<mailto:samuel.or...@intel.com>>; > vfio-users-boun...@redhat.com<mailto:vfio-users-boun...@redhat.com>; > Bradford, Robert <robert.bradf...@intel.com<mailto:robert.bradf...@intel.com>> > Subject: Re: [vfio-users] Nested VFIO with QEMU > > Hi Yi Liu, > > On Tue, 2019-10-22 at 11:01 +0800, Liu, Yi L wrote: > > Hi Sebastien, > > > > > From: vfio-users-boun...@redhat.com<mailto:vfio-users-boun...@redhat.com> > > [mailto:vfio-users- > > > boun...@redhat.com<mailto:boun...@redhat.com>] On Behalf Of Boeuf, > > Sebastien > > > Sent: Friday, October 18, 2019 1:49 PM > > > To: vfio-users@redhat.com<mailto:vfio-users@redhat.com> > > > Cc: Ortiz, Samuel <samuel.or...@intel.com<mailto:samuel.or...@intel.com>>; > > Bradford, Robert > > > <robert.bradf...@intel.com<mailto:robert.bradf...@intel.com>> > > > Subject: [vfio-users] Nested VFIO with QEMU > > > > > > Hi folks, > > > > > > I have been recently working with VFIO, and particularly trying to > > > achieve device passthrough through multiple layers of virtualization. > > > > > > I wanted to assess QEMU's performances with nested VFIO, using the > > > emulated Intel IOMMU device. Unfortunately, I cannot make any of my > > > physical device work when I pass them through, attached to the > > > emulated Intel IOMMU. Using regular VFIO works properly, but as soon > > > > Sorry, what does regular VFIO mean here? > > Sorry, what I called "regular VFIO" is for the case where VFIO is not run > along with > vIOMMU. > > > > > > > > as I enable the virtual IOMMU, the driver fails to probe (I tried on > > > two different machines with different types of NIC). > > > > Ok, so regular VFIO means passthru a device to a VM which has no vIOMMU? > > Yes. > > > > > > > > So I was wondering if someone was aware of any issue with using both > > > VFIO and virtual Intel IOMMU with QEMU? I'm sure I might be missing > > > something obvious but I couldn't find it so far. > > > > I’ve been using VFIO and vIOMMU for a long time, so far it is pretty stable > > for me. I would be pleased to help here. Could you paste your QEMU > > cmdline? And it would also be helpful to paste the error log you got when > > the failure happened. > > So here is the QEMU command line I am using: > > qemu-system-x86_64 \ > -machine q35,accel=kvm,kernel_irqchip=split \ > -bios /home/sebastien/workloads/OVMF.fd \ > -smp sockets=1,cpus=1,cores=1 \ > -cpu host \ > -m 1024 \ > -vga none \ > -nographic \ > -kernel ~/bzImage \ > -append "console=ttyS0 reboot=k root=/dev/vda3 kvm-intel.nested=1 > vfio_iommu_type1.allow_unsafe_interrupts intel_iommu=on rw" \ > -drive if=none,id=drive0,format=raw,file=/home/sebastien/clear-kvm.img \ > -device virtio-blk-pci,drive=drive0,scsi=off \ > -device virtio-rng-pci \ > -device vfio-pci,host=00:19.0 \ > -device intel-iommu,intremap=on,caching-mode=on > > My goal being to simply pass the device which is a fairly simple Intel NIC > into the guest. > Unfortunately, after the VM boots, I can see the interface going up and down. > It basically > keeps resetting after I got the following trace after a few seconds: > > [ 14.223213] NETDEV WATCHDOG: enp0s3 (e1000e): transmit queue 0 timed out > [ 14.224543] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:443 > dev_watchdog+0x200/0x210 > [ 14.224543] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc3+ #169 > [ 14.224543] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 > 02/06/2015 > [ 14.224543] RIP: 0010:dev_watchdog+0x200/0x210 > [ 14.224543] Code: 00 49 63 4e e8 eb 98 4c 89 ef c6 05 ba d5 95 00 01 e8 f4 > e4 fc ff 89 d9 4c 89 ee 48 c7 c7 98 1e ea 81 48 89 c2 e8 05 35 9d ff <0f> 0b > eb c0 66 66 2e 0f 1f 84 00 00 00 00 00 90 55 48 89 e5 41 57 > [ 14.224543] RSP: 0018:ffffc90000003e88 EFLAGS: 00010282 > [ 14.224543] RAX: 0000000000000000 RBX: 0000000000000000 RCX: > 000000000000083f > [ 14.224543] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: > 000000000000003f > [ 14.224543] RBP: ffffc90000003eb8 R08: 00000000000001ee R09: > ffffffff8227ee38 > [ 14.224543] R10: 000000000000004c R11: ffffc90000003ce8 R12: > 0000000000000001 > [ 14.224543] R13: ffff88803bcb0000 R14: ffff88803bcb03b8 R15: > ffff88803bc92680 > [ 14.224543] FS: 0000000000000000(0000) GS:ffff88803f400000(0000) > knlGS:0000000000000000 > [ 14.224543] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 14.224543] CR2: 00007fc745758af0 CR3: 000000003c9f4002 CR4: > 00000000000606b0 > [ 14.224543] Call Trace: > [ 14.224543] <IRQ> > [ 14.224543] ? pfifo_fast_enqueue+0x130/0x130 > [ 14.224543] call_timer_fn.isra.30+0x16/0x80 > [ 14.224543] run_timer_softirq+0x323/0x360 > [ 14.224543] ? clockevents_program_event+0x8e/0xf0 > [ 14.224543] __do_softirq+0xcf/0x21e > [ 14.224543] irq_exit+0x9e/0xa0 > [ 14.224543] smp_apic_timer_interrupt+0x66/0xa0 > [ 14.224543] apic_timer_interrupt+0xf/0x20 > [ 14.224543] </IRQ> > [ 14.224543] RIP: 0010:default_idle+0x12/0x20 > [ 14.224543] Code: 48 83 c0 22 48 89 44 24 28 eb c7 e8 48 fb 90 ff 90 90 90 > 90 90 90 90 90 55 48 89 e5 e9 07 00 00 00 0f 00 2d e2 67 41 00 fb f4 <5d> c3 > 66 66 2e 0f 1f 84 00 00 00 00 00 90 55 65 48 8b 04 25 40 5d > [ 14.224543] RSP: 0018:ffffffff82003e40 EFLAGS: 00000246 ORIG_RAX: > ffffffffffffff13 > [ 14.224543] RAX: ffffffff817f7f10 RBX: 0000000000000000 RCX: > 0000000000000001 > [ 14.224543] RDX: 0000000000001a86 RSI: 0000000000000087 RDI: > ffff88803f41c700 > [ 14.224543] RBP: ffffffff82003e40 R08: 0000000000018470 R09: > ffff88803f41f4c0 > [ 14.224543] R10: 0000000000000000 R11: 0000000000000000 R12: > ffffffff820b5590 > [ 14.224543] R13: 0000000000000000 R14: 0000000000000000 R15: > 000000003e80f000 > [ 14.224543] ? __cpuidle_text_start+0x8/0x8 > [ 14.224543] arch_cpu_idle+0x10/0x20 > [ 14.224543] default_idle_call+0x21/0x30 > [ 14.224543] do_idle+0x1d5/0x1f0 > [ 14.224543] cpu_startup_entry+0x18/0x20 > [ 14.224543] rest_init+0xa9/0xab > [ 14.224543] arch_call_rest_init+0x9/0xc > [ 14.224543] start_kernel+0x451/0x470 > [ 14.224543] x86_64_start_reservations+0x29/0x2b > [ 14.224543] x86_64_start_kernel+0x71/0x74 > [ 14.224543] secondary_startup_64+0xa4/0xb0 > [ 14.224543] ---[ end trace f8ed580b43c5ffcc ]--- > [ 14.224543] e1000e 0000:00:03.0 enp0s3: Reset adapter unexpectedly > > And here the logs about the reset happening: > > [ 20.427211] e1000e: enp0s3 NIC Link is Up 1000 Mbps Full Duplex, Flow > Control: Rx/Tx > [ 25.996633] e1000e 0000:00:03.0 enp0s3: Reset adapter unexpectedly > [ 32.518501] e1000e: enp0s3 NIC Link is Up 1000 Mbps Full Duplex, Flow > Control: Rx/Tx > [ 38.028584] e1000e 0000:00:03.0 enp0s3: Reset adapter unexpectedly > [ 44.579713] e1000e: enp0s3 NIC Link is Up 1000 Mbps Full Duplex, Flow > Control: Rx/Tx > [ 50.060585] e1000e 0000:00:03.0 enp0s3: Reset adapter unexpectedly > [ 56.659831] e1000e: enp0s3 NIC Link is Up 1000 Mbps Full Duplex, Flow > Control: Rx/Tx > [ 62.092632] e1000e 0000:00:03.0 enp0s3: Reset adapter unexpectedly > [ 68.718399] e1000e: enp0s3 NIC Link is Up 1000 Mbps Full Duplex, Flow > Control: Rx/Tx > [ 74.124622] e1000e 0000:00:03.0 enp0s3: Reset adapter unexpectedly > [ 80.798753] e1000e: enp0s3 NIC Link is Up 1000 Mbps Full Duplex, Flow > Control: Rx/Tx > [ 86.156624] e1000e 0000:00:03.0 enp0s3: Reset adapter unexpectedly > > Thanks, > Sebastien > > > > > > > Thanks, > > > Sebastien > > > > > Best Wishes, > Yi Liu > _______________________________________________ vfio-users mailing list vfio-users@redhat.com https://www.redhat.com/mailman/listinfo/vfio-users