On Fri, 2014-10-03 at 13:22 +0200, Knut Omang wrote: > On Wed, 2014-10-01 at 17:08 +0300, Marcel Apfelbaum wrote: > > On Wed, 2014-10-01 at 07:26 +0200, Knut Omang wrote: > > > On Tue, 2014-09-30 at 21:38 +0800, Gonglei wrote: > > > > > Subject: Re: [Qemu-devel] [PATCH v4 2/3] pcie: add check for ari > > > > > capability of > > > > > pcie devices > > > > > > > > > > On Tue, Sep 30, 2014 at 06:11:25PM +0800, arei.gong...@huawei.com > > > > > wrote: > > > > > > From: Gonglei <arei.gong...@huawei.com> > > > > > > > > > > > > In QEMU, ARI Forwarding is enabled default at emulation of PCIe > > > > > > ports. ARI Forwarding enable setting at firmware/OS Control handoff. > > > > > > If the bit is Set when a non-ARI Device is present, the non-ARI > > > > > > Device can respond to Configuration Space accesses under what it > > > > > > interprets as being different Device Numbers, and its Functions can > > > > > > be aliased under multiple Device Numbers, generally leading to > > > > > > undesired behavior. > > > > > > > > > > So what is the undesired behaviour? > > > > > Did you observe such? > > > > > > > > I just observe the PCI device don't work, and the dmesg show me: > > > > > > > > [ 159.035250] pciehp 0000:05:00.0:pcie24: Button pressed on Slot (0 - 4) > > > > [ 159.035274] pciehp 0000:05:00.0:pcie24: Card present on Slot (0 - 4) > > > > [ 159.036517] pciehp 0000:05:00.0:pcie24: PCI slot #0 - 4 - powering on > > > > due to button press. > > > > [ 159.188049] pciehp 0000:05:00.0:pcie24: Failed to check link status > > > > [ 159.201968] pciehp 0000:05:00.0:pcie24: Card not present on Slot (0 - > > > > 4) > > > > [ 159.202529] pciehp 0000:05:00.0:pcie24: Already disabled on Slot (0 - > > > > 4) > > > > > > This seems very much like the symptoms I see when I use hotplug after > > > the latest changes to the hotplug code - do you have something that > > > confirms this has something to do with wrong interpretation of device > > > IDs? My suspicion is that something has gone wrong or is missing in the > > > hotplug logic but I havent had time to dig deeper into it yet. > > Can you please describe me the steps to reproduce the issue? > > Hmm, while trying to reproduce again I realize my hotplug issues are not > the same as the one Gonglei reports, let me come back to that in a > separate mail later, > > My main point here is that I don't see how this particular fix would > alleviate Gonglei's issue, as it does not seem to get triggered unless > there's a bug in the emulated port/switch? > > Gonglei, I assume you are use the TI x3130 in your example: > IMHO any PCIe x 1 device should fit into any of the (3) PCIe slots > provided by the TI x3130, and according to the spec: > > http://www.ti.com/lit/gpn/xio3130 > > these slots appear as separate buses, which means that all devices will > have devfn 0.0 but on different buses, eg. if you have all three switch > slots (not to be confused with the slot number given by the PCI_SLOT() > macro) populated and no other secondary buses before it, you should see > the downstream switch ports as 01:xx:x and devices in 02:00.0, 03:00.0 > and 04:00.0 for slots 0, 1 and 2. This seems to correspond well with the > current Qemu model. > > So unless your device itself exposes function# > 8 and is not ARI > capable (which would be a non-compliant device as far as I read the > standard) you should never be able to see any device in any of the > downstream ports have PCI_SLOT(devfn) != 0 ? > > With qemu master + my SR/IOV patch set + the igb patch (just to have an > ARI capable device to play with) here: > > https://github.com/knuto/qemu, branch sriov_patches_v3 > > and a guest running F20 I am able to boot with a device (such as for > instance an e1000) inserted in a root port, I am also able to hot plug > it from the qemu monitor, eg. > > qemu-kvm ... \ > -device ioh3420,slot=0,id=pcie_port.0 > > ... > > (qemu) device_add e1000,vlan=1,bus=pcie_port.0,id=ne > (qemu) device_del ne > > In both cases the ARIfwd bit is disabled by default and not enabled by > the port driver (just as I would expect) so at least your comment > is wrong as far as I can see. > > Booting with a two-port downstream switch with no devices plugged, I see > the same, ARIfwd is not enabled on the downstream ports as long as no > device is in any slots, which is as expected, isn't it? > > qemu-kvm ... \ > -device x3130-upstream,id=upsw \ > -device xio3130-downstream,bus=upsw,addr=0.0,chassis=5,id=ds_port.0 \ > -device xio3130-downstream,bus=upsw,addr=1.0,chassis=6,id=ds_port.1 > ... > > The upstream port does not support ARIfwd in the QEMU model, which I > suppose is correct as it will only contain individual downstream > devices, which expose new buses but never have more than one function. > > However, either booting or hotplugging an e1000 into the downstream > switch, eg. > > (qemu) device_add e1000,vlan=1,bus=ds_port.0,id=ne > > with my current guest image kernel 3.13.10-200.fc20.x86_64+debug I get a > NULL pointer Oops in the guest, > > [ 0.520009] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000038 > [ 0.521000] IP: [<ffffffff813d942d>] pcie_aspm_init_link_state+0x6ed/0x7c0 > [ 0.521000] PGD 0 > [ 0.521000] Oops: 0000 [#1] SMP
here is the gdb version of the stack trace: 0xffffffff813d9431 in alloc_pcie_link_state (pdev=0xffff880178cec520) at drivers/pci/pcie/aspm.c:530 530 parent = pdev->bus->parent->self->link_state; (gdb) where #0 0xffffffff813d9431 in alloc_pcie_link_state (pdev=0xffff880178cec520) at drivers/pci/pcie/aspm.c:530 #1 pcie_aspm_init_link_state (pdev=0xffff880178cec520) at drivers/pci/pcie/aspm.c:578 #2 0xffffffff813c6bbd in pci_scan_slot (bus=bus@entry=0xffff880178cad388, devfn=devfn@entry=0) at drivers/pci/probe.c:1486 #3 0xffffffff813e0d54 in pciehp_configure_device (p_slot=p_slot@entry=0xffff880173982760) at drivers/pci/hotplug/pciehp_pci.c:54 #4 0xffffffff813e05cb in board_added (p_slot=0xffff880173982760) at drivers/pci/hotplug/pciehp_ctrl.c:223 #5 pciehp_enable_slot (p_slot=p_slot@entry=0xffff880173982760) at drivers/pci/hotplug/pciehp_ctrl.c:510 #6 0xffffffff813e0ac0 in pciehp_power_thread (work=<optimized out>) at drivers/pci/hotplug/pciehp_ctrl.c:308 #7 0xffffffff8109bc31 in process_one_work (worker=worker@entry=0xffff8801730fd728, work=0xffff8801731bbc30) at kernel/workqueue.c:2214 #8 0xffffffff8109c1eb in worker_thread (__worker=0xffff8801730fd728) at kernel/workqueue.c:2340 #9 0xffffffff810a43ef in kthread (_create=0xffff88017336edc8) at kernel/kthread.c:207 #10 <signal handler called> #11 0x0000000000000000 in irq_stack_union () #12 0x0000000000000000 in ?? () (gdb) p pdev->bus->parent->self $2 = (struct pci_dev *) 0x0 <irq_stack_union> (gdb) I observe the same behaviour with guest kernel 3.16.3-200.fc20.x86_64. Knut > The exact same happens if I use the ARI capable igb device instead. > > I will upgrade the guest to a newer image (what kernel are you running with, > Gonglei?) > > lspci reports this on the guest (before I try to add any device in the > downstream port(s)): > > 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM > Controller > 00:01.0 VGA compatible controller: Cirrus Logic GD 5446 > 00:02.0 Ethernet controller: Red Hat, Inc Virtio network device > 00:03.0 Unclassified device [0002]: Red Hat, Inc Virtio filesystem > 00:04.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express > Root Port 0 (rev 02) > 00:05.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Upstream) > (rev 02) > 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller > (rev 02) > 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port > SATA Controller [AHCI mode] (rev 02) > 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev > 02) > 02:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) > (rev 01) > 02:01.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) > (rev 01) > > # lspci -t > -[0000:00]-+-00.0 > +-01.0 > +-02.0 > +-03.0 > +-04.0-[01]-- > +-05.0-[02-04]--+-00.0-[03]-- > | \-01.0-[04]-- > +-1f.0 > +-1f.2 > \-1f.3 > > eg. the devices plugged into the downstream ports will end up at 03:00.0 and > 04:00.0, > the info qtree command verifies that the e1000 insert ends up in 03:00.0 as > expected. > > Similar if I instead use > > (qemu) device_add e1000,vlan=1,bus=ds_port.1,id=ne2 > > I see this with info qtree: > > class Ethernet controller, addr 04:00.0, pci id 8086:100e (sub 1af4:1100) > > Anyway I suppose this indicates that something is not right/is missing > with the downstream switch emulation, and not with the (generic) way the > ARIfwd cap/ctrl bit works > > > It should work correctly by now, maybe a "surprise removal/addition" > > warning and nothing more. > > > > Thank you, > > Marcel > > > > > > > > I am just concerned that this might alleviate the symptoms you see but > > > not fix the problem itself, > > > > > > Knut > > > > > > > > Please make this explicit in the commit log. > > > > > > > > > Sorry, I copied the description from PCIe spec. :( > > > > > > > > IMPLEMENTATION NOTE at Page 19: > > > > > > > > https://www.pcisig.com/specifications/pciexpress/specifications/ECN-alt-rid-interpretation-070604.pdf > > > > > > > > > > > > > > > > > > > > > > > > > So, for PCI devices attached in PCIe root ports or downstream pots, > > > > > > we should assure that its slot is not non-zero. > > > > > > > > > For PCIe devices, which > > > > > > ARP capability is not enabled, we also should assure that its slot > > > > > > is not non-zero. > > > > > > > > > > not non zero => zero > > > > > > > > > OK. > > > > > > > > > > > > > > > > Signed-off-by: Gonglei <arei.gong...@huawei.com> > > > > > > --- > > > > > > hw/pci/pci.c | 4 ++++ > > > > > > hw/pci/pcie.c | 51 > > > > > +++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > include/hw/pci/pcie.h | 1 + > > > > > > 3 files changed, 56 insertions(+) > > > > > > > > > > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c > > > > > > index 6ce75aa..22b7ca0 100644 > > > > > > --- a/hw/pci/pci.c > > > > > > +++ b/hw/pci/pci.c > > > > > > @@ -1770,6 +1770,10 @@ static int pci_qdev_init(DeviceState *qdev) > > > > > > } > > > > > > } > > > > > > > > > > > > + if (pcie_cap_slot_check(bus, pci_dev)) { > > > > > > + return -1; > > > > > > + } > > > > > > + > > > > > > /* rom loading */ > > > > > > is_default_rom = false; > > > > > > if (pci_dev->romfile == NULL && pc->romfile != NULL) { > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > > > > > index 1babddf..b82211a 100644 > > > > > > --- a/hw/pci/pcie.c > > > > > > +++ b/hw/pci/pcie.c > > > > > > @@ -633,3 +633,54 @@ void pcie_ari_init(PCIDevice *dev, uint16_t > > > > > > offset, > > > > > uint16_t nextfn) > > > > > > offset, PCI_ARI_SIZEOF); > > > > > > pci_set_long(dev->config + offset + PCI_ARI_CAP, (nextfn & > > > > > > 0xff) << 8); > > > > > > } > > > > > > + > > > > > > +int pcie_cap_slot_check(PCIBus *bus, PCIDevice *dev) > > > > > > +{ > > > > > > + Object *obj = OBJECT(bus); > > > > > > + > > > > > > + if (pci_bus_is_root(bus)) { > > > > > > + return 0; > > > > > > + } > > > > > > + > > > > > > + if (object_dynamic_cast(obj, TYPE_PCIE_BUS)) { > > > > > > + DeviceState *parent = qbus_get_parent(BUS(obj)); > > > > > > + PCIDevice *pci_dev = PCI_DEVICE(parent); > > > > > > + uint8_t port_type; > > > > > > + /* > > > > > > + * Root ports and downstream ports of switches are the hot > > > > > > + * pluggable ports in a PCI Express hierarchy. > > > > > > + * PCI Express supports chip-to-chip interconnect, a PCIe > > > > > > link can > > > > > > + * only connect one PCI device/Switch/EndPoint or > > > > > > PCI-bridge. > > > > > > + * > > > > > > + * 7.3. Configuration Transaction Rules (PCI Express > > > > > > specification > > > > > 3.0) > > > > > > + * 7.3.1. Device Number > > > > > > + * > > > > > > + * Downstream Ports that do not have ARI Forwarding enabled > > > > > must > > > > > > + * associate only Device 0 with the device attached to the > > > > > > Logical > > > > > Bus > > > > > > + * representing the Link from the Port. > > > > > > + * > > > > > > + * In QEMU, ARI Forwarding is enabled default at emulation > > > > > > of > > > > > PCIe > > > > > > > > > > s/enabled default/enabled by default/ > > It is not when I try this (as discussed above) > The capability is there but it is disabled by default. > > > > > > > > > > OK. > > > > > > > > > > + * ports. ARI Forwarding enable setting at firmware/OS > > > > > > Control > > > > > handoff. > > > > > > > > > > > > > > > can parse last sentence. drop it? > > > > > > > > > OK. > > > > > > > > > > + * If the bit is Set when a non-ARI Device is present, the > > > > > > non-ARI > > > > > > + Device can respond to Configuration Space accesses under > > > > > what it > > > > > > + * interprets as being different Device Numbers, and its > > > > > > Functions > > > > > can > > > > > > + * be aliased under multiple Device Numbers, generally > > > > > > leading to > > > > > > + * undesired behavior. > > I don't understand how this is possible to achieve. > > Knut > > > > > > I don't think any badness really happens. > > > > > Did you observe any? > > > > > > > > > Same with above explanation. > > > > > > > > > > > > > > > > > > > > + */ > > > > > > + port_type = pcie_cap_get_type(pci_dev); > > > > > > + if (port_type == PCI_EXP_TYPE_DOWNSTREAM || > > > > > > + port_type == PCI_EXP_TYPE_ROOT_PORT) { > > > > > > + if (!pci_is_express(dev) || > > > > > > + (pci_is_express(dev) && > > > > > > + !pcie_find_capability(dev, PCI_EXT_CAP_ID_ARI))) { > > > > > > > > > > I would just skip the test for a non express function. > > > > > > > > > Sorry? > > > > > > > > Best regards, > > > > -Gonglei > > > > > > > > > > + if (PCI_SLOT(dev->devfn) != 0) { > > > > > > + error_report("PCIe: non-ARI device can't be > > > > > populated" > > > > > > + " in slot %d", > > > > > PCI_SLOT(dev->devfn)); > > > > > > + return -1; > > > > > > + } > > > > > > + } > > > > > > + } > > > > > > + } > > > > > > + > > > > > > + return 0; > > > > > > +} > > > > > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h > > > > > > index d139d58..1d8f3f4 100644 > > > > > > --- a/include/hw/pci/pcie.h > > > > > > +++ b/include/hw/pci/pcie.h > > > > > > @@ -115,6 +115,7 @@ void pcie_add_capability(PCIDevice *dev, > > > > > > uint16_t offset, uint16_t size); > > > > > > > > > > > > void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t > > > > > > nextfn); > > > > > > +int pcie_cap_slot_check(PCIBus *bus, PCIDevice *dev); > > > > > > > > > > > > extern const VMStateDescription vmstate_pcie_device; > > > > > > > > > > > > -- > > > > > > 1.7.12.4 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >