On Wed, Jul 01, 2015 at 07:11:38PM +0800, Hong Bo Li wrote: > > > On 7/1/2015 18:36, Michael S. Tsirkin wrote: > >On Wed, Jul 01, 2015 at 06:04:24PM +0800, Hong Bo Li wrote: > >> > >>On 7/1/2015 17:22, Michael S. Tsirkin wrote: > >>>On Wed, Jul 01, 2015 at 05:13:11PM +0800, Hong Bo Li wrote: > >>>>On 7/1/2015 16:05, Michael S. Tsirkin wrote: > >>>>>On Wed, Jul 01, 2015 at 03:56:25PM +0800, Hong Bo Li wrote: > >>>>>>On 7/1/2015 14:22, Michael S. Tsirkin wrote: > >>>>>>>On Tue, Jun 30, 2015 at 02:16:59PM +0800, Hong Bo Li wrote: > >>>>>>>>On 6/29/2015 18:01, Michael S. Tsirkin wrote: > >>>>>>>>>On Mon, Jun 29, 2015 at 05:24:53PM +0800, Hong Bo Li wrote: > >>>>>>>>>>This patch introduce a new facility(and bus) > >>>>>>>>>>to hold devices representing information actually > >>>>>>>>>>provided by s390 firmware and I/O configuration. > >>>>>>>>>>usage example: > >>>>>>>>>>-device s390-pcihost > >>>>>>>>>>-device vfio-pci,host=0000:00:00.0,id=vpci1 > >>>>>>>>>>-device zpci,fid=2,uid=5,pci_id=vpci1,id=zpci1 > >>>>>>>>>> > >>>>>>>>>>The first line will create a s390 pci host bridge > >>>>>>>>>>and init the root bus. The second line will create > >>>>>>>>>>a standard vfio pci device, and attach it to the > >>>>>>>>>>root bus. These are similiar to the standard process > >>>>>>>>>>to define a pci device on other platform. > >>>>>>>>>> > >>>>>>>>>>The third line will create a s390 pci device to > >>>>>>>>>>store s390 specific information, and references > >>>>>>>>>>the corresponding vfio pci device via device id. > >>>>>>>>>>We create a s390 pci facility bus to hold all the > >>>>>>>>>>zpci devices. > >>>>>>>>>> > >>>>>>>>>>Signed-off-by: Hong Bo Li <lih...@linux.vnet.ibm.com> > >>>>>>>>>It's mostly up to s390 maintainers, but I'd like to note > >>>>>>>>>one thing below > >>>>>>>>> > >>>>>>>>>>--- > >>>>>>>>>> hw/s390x/s390-pci-bus.c | 314 > >>>>>>>>>> +++++++++++++++++++++++++++++++++------------ > >>>>>>>>>> hw/s390x/s390-pci-bus.h | 48 ++++++- > >>>>>>>>>> hw/s390x/s390-pci-inst.c | 4 +- > >>>>>>>>>> hw/s390x/s390-virtio-ccw.c | 5 +- > >>>>>>>>>> 4 files changed, 283 insertions(+), 88 deletions(-) > >>>>>>>>>> > >>>>>>>>>>diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c > >>>>>>>>>>index 560b66a..d5e7b2e 100644 > >>>>>>>>>>--- a/hw/s390x/s390-pci-bus.c > >>>>>>>>>>+++ b/hw/s390x/s390-pci-bus.c > >>>>>>>>>>@@ -32,8 +32,8 @@ int chsc_sei_nt2_get_event(void *res) > >>>>>>>>>> PciCcdfErr *eccdf; > >>>>>>>>>> int rc = 1; > >>>>>>>>>> SeiContainer *sei_cont; > >>>>>>>>>>- S390pciState *s = S390_PCI_HOST_BRIDGE( > >>>>>>>>>>- object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL)); > >>>>>>>>>>+ S390PCIFacility *s = S390_PCI_FACILITY( > >>>>>>>>>>+ object_resolve_path(TYPE_S390_PCI_FACILITY, NULL)); > >>>>>>>>>> if (!s) { > >>>>>>>>>> return rc; > >>>>>>>>>>@@ -72,8 +72,8 @@ int chsc_sei_nt2_get_event(void *res) > >>>>>>>>>> int chsc_sei_nt2_have_event(void) > >>>>>>>>>> { > >>>>>>>>>>- S390pciState *s = S390_PCI_HOST_BRIDGE( > >>>>>>>>>>- object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL)); > >>>>>>>>>>+ S390PCIFacility *s = S390_PCI_FACILITY( > >>>>>>>>>>+ object_resolve_path(TYPE_S390_PCI_FACILITY, NULL)); > >>>>>>>>>> if (!s) { > >>>>>>>>>> return 0; > >>>>>>>>>>@@ -82,20 +82,32 @@ int chsc_sei_nt2_have_event(void) > >>>>>>>>>> return !QTAILQ_EMPTY(&s->pending_sei); > >>>>>>>>>> } > >>>>>>>>>>+void s390_pci_device_enable(S390PCIBusDevice *zpci) > >>>>>>>>>>+{ > >>>>>>>>>>+ zpci->fh = zpci->fh | 1 << ENABLE_BIT_OFFSET; > >>>>>>>>>>+} > >>>>>>>>>>+ > >>>>>>>>>>+void s390_pci_device_disable(S390PCIBusDevice *zpci) > >>>>>>>>>>+{ > >>>>>>>>>>+ zpci->fh = zpci->fh & ~(1 << ENABLE_BIT_OFFSET); > >>>>>>>>>>+ if (zpci->is_unplugged) > >>>>>>>>>>+ object_unparent(OBJECT(zpci)); > >>>>>>>>>>+} > >>>>>>>>>>+ > >>>>>>>>>> S390PCIBusDevice *s390_pci_find_dev_by_fid(uint32_t fid) > >>>>>>>>>> { > >>>>>>>>>> S390PCIBusDevice *pbdev; > >>>>>>>>>>- int i; > >>>>>>>>>>- S390pciState *s = S390_PCI_HOST_BRIDGE( > >>>>>>>>>>- object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL)); > >>>>>>>>>>+ BusChild *kid; > >>>>>>>>>>+ S390PCIFacility *s = S390_PCI_FACILITY( > >>>>>>>>>>+ object_resolve_path(TYPE_S390_PCI_FACILITY, NULL)); > >>>>>>>>>> if (!s) { > >>>>>>>>>> return NULL; > >>>>>>>>>> } > >>>>>>>>>>- for (i = 0; i < PCI_SLOT_MAX; i++) { > >>>>>>>>>>- pbdev = &s->pbdev[i]; > >>>>>>>>>>- if ((pbdev->fh != 0) && (pbdev->fid == fid)) { > >>>>>>>>>>+ QTAILQ_FOREACH(kid, &s->fbus->qbus.children, sibling) { > >>>>>>>>>>+ pbdev = (S390PCIBusDevice *)kid->child; > >>>>>>>>>>+ if (pbdev->fid == fid) { > >>>>>>>>>> return pbdev; > >>>>>>>>>> } > >>>>>>>>>> } > >>>>>>>>>>@@ -126,39 +138,24 @@ void s390_pci_sclp_configure(int configure, > >>>>>>>>>>SCCB *sccb) > >>>>>>>>>> return; > >>>>>>>>>> } > >>>>>>>>>>-static uint32_t s390_pci_get_pfid(PCIDevice *pdev) > >>>>>>>>>>-{ > >>>>>>>>>>- return PCI_SLOT(pdev->devfn); > >>>>>>>>>>-} > >>>>>>>>>>- > >>>>>>>>>>-static uint32_t s390_pci_get_pfh(PCIDevice *pdev) > >>>>>>>>>>-{ > >>>>>>>>>>- return PCI_SLOT(pdev->devfn) | FH_VIRT; > >>>>>>>>>>-} > >>>>>>>>>>- > >>>>>>>>>> S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx) > >>>>>>>>>> { > >>>>>>>>>> S390PCIBusDevice *pbdev; > >>>>>>>>>>- int i; > >>>>>>>>>>- int j = 0; > >>>>>>>>>>- S390pciState *s = S390_PCI_HOST_BRIDGE( > >>>>>>>>>>- object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL)); > >>>>>>>>>>+ BusChild *kid; > >>>>>>>>>>+ int i = 0; > >>>>>>>>>>+ S390PCIFacility *s = S390_PCI_FACILITY( > >>>>>>>>>>+ object_resolve_path(TYPE_S390_PCI_FACILITY, NULL)); > >>>>>>>>>> if (!s) { > >>>>>>>>>> return NULL; > >>>>>>>>>> } > >>>>>>>>>>- for (i = 0; i < PCI_SLOT_MAX; i++) { > >>>>>>>>>>- pbdev = &s->pbdev[i]; > >>>>>>>>>>- > >>>>>>>>>>- if (pbdev->fh == 0) { > >>>>>>>>>>- continue; > >>>>>>>>>>- } > >>>>>>>>>>- > >>>>>>>>>>- if (j == idx) { > >>>>>>>>>>+ QTAILQ_FOREACH(kid, &s->fbus->qbus.children, sibling) { > >>>>>>>>>>+ pbdev = (S390PCIBusDevice *)kid->child; > >>>>>>>>>>+ if (i == idx) { > >>>>>>>>>> return pbdev; > >>>>>>>>>> } > >>>>>>>>>>- j++; > >>>>>>>>>>+ i++; > >>>>>>>>>> } > >>>>>>>>>> return NULL; > >>>>>>>>>This relies on the order of children on the qbus, that's wrong I > >>>>>>>>>think. > >>>>>>>>>Generally I'm not sure why do you convert all slot lookups to child > >>>>>>>>>lookups: more code to achieve the same effect? > >>>>>>>>Thank you Michael. > >>>>>>>>I do the change due to two reasons: > >>>>>>>>1. The old implement only supports one s390 pci root bus, and > >>>>>>>>32(PCI_SLOT_MAX) > >>>>>>>>slots at most. So when it comes to multiple s390 pci root buses, the > >>>>>>>>old code > >>>>>>>>does not work. > >>>>>>>>2. Now the zpci device "S390PCIBusDevice" is only a structure to store > >>>>>>>>s390 specific information, so we can attach all the zpci devices to a > >>>>>>>>s390 pci facility bus. Since these zpci device has no relation with > >>>>>>>>the "slot", > >>>>>>>>so the order of them does not matter. > >>>>>>>But you make this order guest-visible which seems wrong. > >>>>>>> > >>>>>>The guest uses a s390 specific "list pci" instruction to get all the > >>>>>>zpci > >>>>>>devices, and will > >>>>>>create a root s390 pci bus for each device. So the order has no > >>>>>>relation > >>>>>>with the pci > >>>>>>topology on guest. > >>>>>> > >>>>>>If we assign too many zpci devices to one guest, the "list pci" > >>>>>>instruction > >>>>>>will use a > >>>>>>resume token to get all the zpci devices. For example, first time we > >>>>>>return > >>>>>>32 zpci > >>>>>>devices to guest. Next time we'll return another 32 zpci devices. The > >>>>>>resume > >>>>>>token > >>>>>>is used to store the beginning of zpci devices that will be returned to > >>>>>>guest at next time. > >>>>>> > >>>>>>So, if we change the order of the zpci device on s390 facility bus, it > >>>>>>may > >>>>>>change the > >>>>>>"batch" in which this device be returned to guest. But this will not > >>>>>>change > >>>>>>the pci > >>>>>>topology on guest. > >>>>>Yes but that's still guest visible, and will break > >>>>>for example if guest is migrated between qemu instances > >>>>>where list order is different precisely when > >>>>>it's enumerating the bus. > >>>>> > >>>>Yes, and the list order is not the only s390 specific information that > >>>>exposed to > >>>>guest. Besides that, we need to migrate all other zpci information. For > >>>>now, > >>>>we have no plan to support zpci migration yet. > >>>BTW how will hotplug work? If it happens while guest > >>>enumerates the bus the naturally all index values > >>>become invalid. > >>The list zpci only happen when the guest doing pci_base_init() for s390. > >>At that moment, hotplug does not work yet. > >You can't prevent this: user can request hotplug at this time. > > > >>And assume we have > >>that case, we still have the index issue even when scan standard pci > >>bus. Please see my following words. > >> > >>>Just don't expose internal qdev data structures to guest. > >>>It's not by chance that we don't have a look up by index > >>>capability, it's an attempt to enfoce sane usage. > >>>You are misusing the API with your hack. > >>The resume token of list zpci is indeed an index of iteration:( > >> > >>>PCI has standard ways to enumerate the bus, maybe you > >>>should emulate it. Or find some other way that works. > >>>The idea to poke at s->fbus->qbus and count things there > >>>is a bad one. > >>> > >>I can define multiple zpci buses, and attach zpci device to a slot of a root > >>bus. > >>Then I need to add a api to the common pci code to do the scan of all the > >>pci host bridges. And in this way, it still has the index issue. I need to > >>scan > >>from the first bus to count the index. So any suggestion from you? > >> > >OK, I looked at arch/s390/pci/pci.c. > >First of all, it seems to run the regular PCI thing on bridges. > > > > zdev->bus = pci_scan_root_bus(NULL, ZPCI_BUS_NR, &pci_root_ops, > > zdev, &resources); > > At this moment, the guest has got all the zpci devices through clp list zpci > instruction. For each device, in the pci_scan_root_bus(), it will create > a root bus. So for s390, we get pci devices first, then create a new root bus > for it.
I don't see this in guest code. I looked at pci_scan_root_bus and it's completely generic. It sets up the bus: b = pci_create_root_bus(parent, bus, ops, sysdata, resources); then it scans it: max = pci_scan_child_bus(b); that one does /* Go find them, Rover! */ for (devfn = 0; devfn < 0x100; devfn += 8) pci_scan_slot(bus, devfn); next dev = pci_scan_single_device(bus, devfn); and so on. Eventually you get if (!pci_bus_read_dev_vendor_id(bus, devfn, &l, 60*1000)) return NULL; and that one does the clp thing using zpci_cfg_load. > > > > >so to me, it looks like there's no need to expose > >non-root buses through special means. > > > >What to do for root buses is a different question but again, > >you definitely do not want to rely on the order of things > >on that linked list. > >The simplest thing is to ask user to give them unique > >numbers, or find some stable way to sort them that > >does not rely on order of initialization (e.g. device IDs?). > > > >But again, this only works ok for root buses. > > > Basically, it does not exposed the buses to guest, it exposed an index > to guest. > Here is the process to get all the zpci device for a guest. > For example: we have 10 zpci devices, and the batch size for list zpci > instruction is 4. > First, qemu will return devices 0-3, index of list zpci is 0 > Second, qemu will return device 4-7, index of list zpci is 4 > Third, qemu will return device 8-9, index of list zpci is 8 > We have device id, but list zpci does not use that as a flag to get > next batch, it use an index instead. > This process is defined by s390 arch, we can't change it. > So no matter how we organize zpci devices in qemu, slot or link list. > We could not get rid of the index issue. > > How about I add a flag to identify whether the link list > is valid or not. When a hotplug/unplug event occurred, I will > reset the index, and make the guest refetch the zpci devices > from the beginning. > > > > You should just use something stable for IDs. And avoid doing it for anything that isn't a root or maybe a bridge since it'll just cause everyone maintainance problems down the road. -- MST