On Wed, 31 Aug 2016 13:56:20 -0600 Alex Williamson <alex.william...@redhat.com> wrote:
> On Tue, 19 Jul 2016 15:38:23 +0800 > Zhou Jie <zhoujie2...@cn.fujitsu.com> wrote: > > > From: Chen Fan <chen.fan.f...@cn.fujitsu.com> > > > > When assigning a vfio device with AER enabled, we must check whether > > the device supports a host bus reset (ie. hot reset) as this may be > > used by the guest OS in order to recover the device from an AER > > error. QEMU must therefore have the ability to perform a physical > > host bus reset using the existing vfio APIs in response to a virtual > > bus reset in the VM. A physical bus reset affects all of the devices > > on the host bus, therefore we place a few simplifying configuration > > restriction on the VM: > > > > - All physical devices affected by a bus reset must be assigned to > > the VM with AER enabled on each and be configured on the same > > virtual bus in the VM. > > > > - No devices unaffected by the bus reset, be they physical, emulated, > > or paravirtual may be configured on the same virtual bus as a > > device supporting AER signaling through vfio. > > > > In other words users wishing to enable AER on a multifunction device > > need to assign all functions of the device to the same virtual bus > > and enable AER support for each device. The easiest way to > > accomplish this is to identity map the physical functions to virtual > > functions with multifunction enabled on the virtual device. > > Why am I able to start the following VM with aer=on for the vfio-pci > devices? > > # lspci -tv > -[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller > +-01.0 Device 1234:1111 > +-1c.0-[01]-- > +-1d.0-[02]--+-01.0 Intel Corporation 82576 Gigabit Network > Connection > | \-01.1 Intel Corporation 82576 Gigabit Network > Connection > ... > > # lspci -vvv -s 1d.0 > 00:1d.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge (prog-if 00 [Normal > decode]) > > The devices are behind a PCIe-to-PCI bridge, so shouldn't specifying > aer=on for the vfio-pci devices cause a configuration error? > > commandline: > > /home/alwillia/local/bin/qemu-system-x86_64 -name > guest=rhel7-q35,debug-threads=on -S -object > secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-11-rhel7-q35/master-key.aes > -machine pc-q35-2.7,accel=kvm,usb=off,vmport=off -cpu IvyBridge -m 8192 > -realtime mlock=off -smp 6,sockets=1,cores=6,threads=1 -uuid > b20b28b4-9304-4e11-9ffa-0367aeb44afb -no-user-config -nodefaults -chardev > socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-11-rhel7-q35/monitor.sock,server,nowait > -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew > -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global > ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device > i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e -device > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 -device > pci-bridge,chassis_nr=3,id=pci.3,bus=pcie.0,addr=0x1d -device > ioh3420,port=0xe0,chassis=4,id=pci.4,bus=pcie.0,addr=0x1c -device > ich9-usb-ehci1,id=usb,bus=pci! .2,addr=0x3.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.2,multifunction=on,addr=0x3 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.2,addr=0x3.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.2,addr=0x3.0x2 -drive file=/dev/rhel/rhel7-q35,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.2,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:50:ec:0d,bus=pci.2,addr=0x1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 127.0.0.1:0 -device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 -device intel-hda,id=sound0,bus=pci.2,addr=0x2 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device vfio-pci,aer=on,host=07:00.0,id=hostdev0,bus=pci.3,multifunction=on,addr=0x1 -device vfio-pci,! aer=on,host=07:00.1,id=hostdev1,bus=pci.3,addr=0x1.0x1 -msg timestamp=on > I had to move to a different system where I could actually inject an aer error and created a config similar to above but with the 82576 ports downstream of the ioh3420 root port. When I inject a malformed TLP uncorrectable error, my RHEL7.2 guest does this: [ 35.995645] pcieport 0000:00:1c.0: AER: Multiple Uncorrected (Fatal) error received: id=0200 [ 35.998483] igb 0000:02:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Unaccessible, id=0200(Unregistered Agent ID) [ 36.001965] igb 0000:02:00.0 enp2s0f0: PCIe link lost, device now detached [ 36.015092] igb 0000:02:00.1 enp2s0f1: PCIe link lost, device now detached [ 39.133185] igb 0000:02:00.0: enabling device (0000 -> 0002) [ 40.071245] igb 0000:02:00.1: enabling device (0000 -> 0002) [ 41.014451] BUG: unable to handle kernel paging request at 0000000000003818 [ 41.015969] IP: [<ffffffffa02b438d>] igb_configure_tx_ring+0x14d/0x280 [igb] [ 41.017507] PGD 367e2067 PUD 7ae56067 PMD 0 [ 41.018497] Oops: 0002 [#1] SMP [ 41.019242] Modules linked in: ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter snd_hda_codec_generic snd_hda_intel snd_hda_codec ppdev snd_hda_core snd_hwdep snd_seq snd_seq_device iTCO_wdt iTCO_vendor_support bochs_drm snd_pcm syscopyarea sysfillrect sysimgblt ttm virtio_balloon snd_timer snd igb drm_kms_helper soundcore ptp pps_core i2c_algo_bit i2c_i801 dca drm shpchp lpc_ich mfd_core pcspkr i2c_core parport_pc parport ip_tables xfs libcrc32c virtio_blk virtio_console virtio_net ahci libahci crc32c_intel serio_raw libata virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [ 41.040590] CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 3.10.0-327.el7.x86_64 #1 [ 41.042180] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 [ 41.044635] Workqueue: events aer_isr [ 41.045478] task: ffff880179435080 ti: ffff880179680000 task.ti: ffff880179680000 [ 41.047097] RIP: 0010:[<ffffffffa02b438d>] [<ffffffffa02b438d>] igb_configure_tx_ring+0x14d/0x280 [igb] [ 41.049151] RSP: 0018:ffff880179683bf8 EFLAGS: 00010246 [ 41.050260] RAX: 0000000000003818 RBX: 0000000000000000 RCX: 0000000000003818 [ 41.051747] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 00000000002896b3 [ 41.053268] RBP: ffff880179683c20 R08: 0000000001010100 R09: 00000000ffffffe7 [ 41.054730] R10: ffffea0001eb6100 R11: ffffffffa02afa31 R12: 0000000000000000 [ 41.056201] R13: ffff880035dbc8c0 R14: ffff880175d03f80 R15: 000000017716e000 [ 41.057673] FS: 0000000000000000(0000) GS:ffff88017fc00000(0000) knlGS:0000000000000000 [ 41.059337] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 41.060548] CR2: 0000000000003818 CR3: 0000000178331000 CR4: 00000000000006f0 [ 41.062028] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 41.063534] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 41.065025] Stack: [ 41.065473] ffff880035dbc8c0 ffff880035dbce70 0000000000000001 ffff880035dbc8c8 [ 41.067119] ffff880035dbce70 ffff880179683c80 ffffffffa02b8a77 fefdf27269fb3cd8 [ 41.068781] 2009f9ee3386436f eb9e4e66756bbfdd 34002f8114a5d65f 9535990856231c4b [ 41.094179] Call Trace: [ 41.118688] [<ffffffffa02b8a77>] igb_configure+0x267/0x450 [igb] [ 41.144286] [<ffffffffa02b94f1>] igb_up+0x21/0x1a0 [igb] [ 41.170606] [<ffffffffa02b96a7>] igb_io_resume+0x37/0x70 [igb] [ 41.195846] [<ffffffff813381e0>] ? pci_cleanup_aer_uncorrect_error_status+0x90/0x90 [ 41.221767] [<ffffffff81338228>] report_resume+0x48/0x60 [ 41.246455] [<ffffffff8131e359>] pci_walk_bus+0x79/0xa0 [ 41.270722] [<ffffffff813381e0>] ? pci_cleanup_aer_uncorrect_error_status+0x90/0x90 [ 41.296747] [<ffffffff813382f0>] broadcast_error_message+0xb0/0x100 [ 41.321552] [<ffffffff81338509>] do_recovery+0x1c9/0x280 [ 41.345507] [<ffffffff81338f58>] aer_isr+0x348/0x430 [ 41.368851] [<ffffffff8109d5fb>] process_one_work+0x17b/0x470 [ 41.392157] [<ffffffff8109e3cb>] worker_thread+0x11b/0x400 [ 41.416852] [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400 [ 41.441577] [<ffffffff810a5aef>] kthread+0xcf/0xe0 [ 41.465029] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 41.488341] [<ffffffff81645858>] ret_from_fork+0x58/0x90 [ 41.511247] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 [ 41.535442] Code: c1 49 89 4e 30 49 8b 85 b8 05 00 00 48 85 c0 0f 84 39 01 00 00 81 c2 10 38 00 00 48 63 d2 48 01 d0 31 d2 89 10 49 8b 46 30 31 d2 <89> 10 41 8b 95 3c 06 00 00 b8 14 01 10 02 83 fa 05 74 0b 83 fa [ 41.587718] RIP [<ffffffffa02b438d>] igb_configure_tx_ring+0x14d/0x280 [igb] [ 41.610872] RSP <ffff880179683bf8> [ 41.632301] CR2: 0000000000003818 And then it reboots. So what RAS improvement have we bought ourselves here? What endpoints have you tested with this? Which ones recovered reliably? Thanks, Alex