On Thu, Jul 02, 2015 at 10:57:34AM +0800, Hong Bo Li wrote: > > > On 7/1/2015 21:37, Michael S. Tsirkin wrote: > >On Wed, Jul 01, 2015 at 08:42:52PM +0800, Hong Bo Li wrote: > >> > >>On 7/1/2015 19:57, Michael S. Tsirkin wrote: > >>>On Wed, Jul 01, 2015 at 07:46:01PM +0800, Hong Bo Li wrote: > >>>>On 7/1/2015 19:23, Michael S. Tsirkin wrote: > >>>>>On Wed, Jul 01, 2015 at 07:11:38PM +0800, Hong Bo Li wrote: > >>>>>>On 7/1/2015 18:36, Michael S. Tsirkin wrote: > >>>>>>>On Wed, Jul 01, 2015 at 06:04:24PM +0800, Hong Bo Li wrote: > >>>>>>>>On 7/1/2015 17:22, Michael S. Tsirkin wrote: > >>>>>>>>>On Wed, Jul 01, 2015 at 05:13:11PM +0800, Hong Bo Li wrote: > >>>>>>>>>>On 7/1/2015 16:05, Michael S. Tsirkin wrote: > >>>>>>>>>>>On Wed, Jul 01, 2015 at 03:56:25PM +0800, Hong Bo Li wrote: > >>>>>>>>>>>>On 7/1/2015 14:22, Michael S. Tsirkin wrote: > >>>>>>>>>>>>>On Tue, Jun 30, 2015 at 02:16:59PM +0800, Hong Bo Li wrote: > >>>>>>>>>>>>>>On 6/29/2015 18:01, Michael S. Tsirkin wrote: > >>>>>>>>>>>>>>>On Mon, Jun 29, 2015 at 05:24:53PM +0800, Hong Bo Li wrote: > >>>>>>>>>>>>>>>>This patch introduce a new facility(and bus) > >>>>>>>>>>>>>>>>to hold devices representing information actually > >>>>>>>>>>>>>>>>provided by s390 firmware and I/O configuration. > >>>>>>>>>>>>>>>>usage example: > >>>>>>>>>>>>>>>>-device s390-pcihost > >>>>>>>>>>>>>>>>-device vfio-pci,host=0000:00:00.0,id=vpci1 > >>>>>>>>>>>>>>>>-device zpci,fid=2,uid=5,pci_id=vpci1,id=zpci1 > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>The first line will create a s390 pci host bridge > >>>>>>>>>>>>>>>>and init the root bus. The second line will create > >>>>>>>>>>>>>>>>a standard vfio pci device, and attach it to the > >>>>>>>>>>>>>>>>root bus. These are similiar to the standard process > >>>>>>>>>>>>>>>>to define a pci device on other platform. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>The third line will create a s390 pci device to > >>>>>>>>>>>>>>>>store s390 specific information, and references > >>>>>>>>>>>>>>>>the corresponding vfio pci device via device id. > >>>>>>>>>>>>>>>>We create a s390 pci facility bus to hold all the > >>>>>>>>>>>>>>>>zpci devices. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>Signed-off-by: Hong Bo Li <lih...@linux.vnet.ibm.com> > >>>>>>>>>>>>>>>It's mostly up to s390 maintainers, but I'd like to note > >>>>>>>>>>>>>>>one thing below > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>--- > >>>>>>>>>>>>>>>> hw/s390x/s390-pci-bus.c | 314 > >>>>>>>>>>>>>>>> +++++++++++++++++++++++++++++++++------------ > >>>>>>>>>>>>>>>> hw/s390x/s390-pci-bus.h | 48 ++++++- > >>>>>>>>>>>>>>>> hw/s390x/s390-pci-inst.c | 4 +- > >>>>>>>>>>>>>>>> hw/s390x/s390-virtio-ccw.c | 5 +- > >>>>>>>>>>>>>>>> 4 files changed, 283 insertions(+), 88 deletions(-) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c > >>>>>>>>>>>>>>>>index 560b66a..d5e7b2e 100644 > >>>>>>>>>>>>>>>>--- a/hw/s390x/s390-pci-bus.c > >>>>>>>>>>>>>>>>+++ b/hw/s390x/s390-pci-bus.c > >>>>>>>>>>>>>>>>@@ -32,8 +32,8 @@ int chsc_sei_nt2_get_event(void *res) > >>>>>>>>>>>>>>>> PciCcdfErr *eccdf; > >>>>>>>>>>>>>>>> int rc = 1; > >>>>>>>>>>>>>>>> SeiContainer *sei_cont; > >>>>>>>>>>>>>>>>- S390pciState *s = S390_PCI_HOST_BRIDGE( > >>>>>>>>>>>>>>>>- object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, > >>>>>>>>>>>>>>>>NULL)); > >>>>>>>>>>>>>>>>+ S390PCIFacility *s = S390_PCI_FACILITY( > >>>>>>>>>>>>>>>>+ object_resolve_path(TYPE_S390_PCI_FACILITY, NULL)); > >>>>>>>>>>>>>>>> if (!s) { > >>>>>>>>>>>>>>>> return rc; > >>>>>>>>>>>>>>>>@@ -72,8 +72,8 @@ int chsc_sei_nt2_get_event(void *res) > >>>>>>>>>>>>>>>> int chsc_sei_nt2_have_event(void) > >>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>>- S390pciState *s = S390_PCI_HOST_BRIDGE( > >>>>>>>>>>>>>>>>- object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, > >>>>>>>>>>>>>>>>NULL)); > >>>>>>>>>>>>>>>>+ S390PCIFacility *s = S390_PCI_FACILITY( > >>>>>>>>>>>>>>>>+ object_resolve_path(TYPE_S390_PCI_FACILITY, NULL)); > >>>>>>>>>>>>>>>> if (!s) { > >>>>>>>>>>>>>>>> return 0; > >>>>>>>>>>>>>>>>@@ -82,20 +82,32 @@ int chsc_sei_nt2_have_event(void) > >>>>>>>>>>>>>>>> return !QTAILQ_EMPTY(&s->pending_sei); > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>+void s390_pci_device_enable(S390PCIBusDevice *zpci) > >>>>>>>>>>>>>>>>+{ > >>>>>>>>>>>>>>>>+ zpci->fh = zpci->fh | 1 << ENABLE_BIT_OFFSET; > >>>>>>>>>>>>>>>>+} > >>>>>>>>>>>>>>>>+ > >>>>>>>>>>>>>>>>+void s390_pci_device_disable(S390PCIBusDevice *zpci) > >>>>>>>>>>>>>>>>+{ > >>>>>>>>>>>>>>>>+ zpci->fh = zpci->fh & ~(1 << ENABLE_BIT_OFFSET); > >>>>>>>>>>>>>>>>+ if (zpci->is_unplugged) > >>>>>>>>>>>>>>>>+ object_unparent(OBJECT(zpci)); > >>>>>>>>>>>>>>>>+} > >>>>>>>>>>>>>>>>+ > >>>>>>>>>>>>>>>> S390PCIBusDevice *s390_pci_find_dev_by_fid(uint32_t fid) > >>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>> S390PCIBusDevice *pbdev; > >>>>>>>>>>>>>>>>- int i; > >>>>>>>>>>>>>>>>- S390pciState *s = S390_PCI_HOST_BRIDGE( > >>>>>>>>>>>>>>>>- object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, > >>>>>>>>>>>>>>>>NULL)); > >>>>>>>>>>>>>>>>+ BusChild *kid; > >>>>>>>>>>>>>>>>+ S390PCIFacility *s = S390_PCI_FACILITY( > >>>>>>>>>>>>>>>>+ object_resolve_path(TYPE_S390_PCI_FACILITY, NULL)); > >>>>>>>>>>>>>>>> if (!s) { > >>>>>>>>>>>>>>>> return NULL; > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>- for (i = 0; i < PCI_SLOT_MAX; i++) { > >>>>>>>>>>>>>>>>- pbdev = &s->pbdev[i]; > >>>>>>>>>>>>>>>>- if ((pbdev->fh != 0) && (pbdev->fid == fid)) { > >>>>>>>>>>>>>>>>+ QTAILQ_FOREACH(kid, &s->fbus->qbus.children, sibling) { > >>>>>>>>>>>>>>>>+ pbdev = (S390PCIBusDevice *)kid->child; > >>>>>>>>>>>>>>>>+ if (pbdev->fid == fid) { > >>>>>>>>>>>>>>>> return pbdev; > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>@@ -126,39 +138,24 @@ void s390_pci_sclp_configure(int > >>>>>>>>>>>>>>>>configure, SCCB *sccb) > >>>>>>>>>>>>>>>> return; > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>-static uint32_t s390_pci_get_pfid(PCIDevice *pdev) > >>>>>>>>>>>>>>>>-{ > >>>>>>>>>>>>>>>>- return PCI_SLOT(pdev->devfn); > >>>>>>>>>>>>>>>>-} > >>>>>>>>>>>>>>>>- > >>>>>>>>>>>>>>>>-static uint32_t s390_pci_get_pfh(PCIDevice *pdev) > >>>>>>>>>>>>>>>>-{ > >>>>>>>>>>>>>>>>- return PCI_SLOT(pdev->devfn) | FH_VIRT; > >>>>>>>>>>>>>>>>-} > >>>>>>>>>>>>>>>>- > >>>>>>>>>>>>>>>> S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx) > >>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>> S390PCIBusDevice *pbdev; > >>>>>>>>>>>>>>>>- int i; > >>>>>>>>>>>>>>>>- int j = 0; > >>>>>>>>>>>>>>>>- S390pciState *s = S390_PCI_HOST_BRIDGE( > >>>>>>>>>>>>>>>>- object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, > >>>>>>>>>>>>>>>>NULL)); > >>>>>>>>>>>>>>>>+ BusChild *kid; > >>>>>>>>>>>>>>>>+ int i = 0; > >>>>>>>>>>>>>>>>+ S390PCIFacility *s = S390_PCI_FACILITY( > >>>>>>>>>>>>>>>>+ object_resolve_path(TYPE_S390_PCI_FACILITY, NULL)); > >>>>>>>>>>>>>>>> if (!s) { > >>>>>>>>>>>>>>>> return NULL; > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>- for (i = 0; i < PCI_SLOT_MAX; i++) { > >>>>>>>>>>>>>>>>- pbdev = &s->pbdev[i]; > >>>>>>>>>>>>>>>>- > >>>>>>>>>>>>>>>>- if (pbdev->fh == 0) { > >>>>>>>>>>>>>>>>- continue; > >>>>>>>>>>>>>>>>- } > >>>>>>>>>>>>>>>>- > >>>>>>>>>>>>>>>>- if (j == idx) { > >>>>>>>>>>>>>>>>+ QTAILQ_FOREACH(kid, &s->fbus->qbus.children, sibling) { > >>>>>>>>>>>>>>>>+ pbdev = (S390PCIBusDevice *)kid->child; > >>>>>>>>>>>>>>>>+ if (i == idx) { > >>>>>>>>>>>>>>>> return pbdev; > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>- j++; > >>>>>>>>>>>>>>>>+ i++; > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>> return NULL; > >>>>>>>>>>>>>>>This relies on the order of children on the qbus, that's wrong > >>>>>>>>>>>>>>>I think. > >>>>>>>>>>>>>>>Generally I'm not sure why do you convert all slot lookups to > >>>>>>>>>>>>>>>child > >>>>>>>>>>>>>>>lookups: more code to achieve the same effect? > >>>>>>>>>>>>>>Thank you Michael. > >>>>>>>>>>>>>>I do the change due to two reasons: > >>>>>>>>>>>>>>1. The old implement only supports one s390 pci root bus, and > >>>>>>>>>>>>>>32(PCI_SLOT_MAX) > >>>>>>>>>>>>>>slots at most. So when it comes to multiple s390 pci root > >>>>>>>>>>>>>>buses, the old code > >>>>>>>>>>>>>>does not work. > >>>>>>>>>>>>>>2. Now the zpci device "S390PCIBusDevice" is only a structure > >>>>>>>>>>>>>>to store > >>>>>>>>>>>>>>s390 specific information, so we can attach all the zpci > >>>>>>>>>>>>>>devices to a > >>>>>>>>>>>>>>s390 pci facility bus. Since these zpci device has no relation > >>>>>>>>>>>>>>with the "slot", > >>>>>>>>>>>>>>so the order of them does not matter. > >>>>>>>>>>>>>But you make this order guest-visible which seems wrong. > >>>>>>>>>>>>> > >>>>>>>>>>>>The guest uses a s390 specific "list pci" instruction to get all > >>>>>>>>>>>>the zpci > >>>>>>>>>>>>devices, and will > >>>>>>>>>>>>create a root s390 pci bus for each device. So the order has no > >>>>>>>>>>>>relation > >>>>>>>>>>>>with the pci > >>>>>>>>>>>>topology on guest. > >>>>>>>>>>>> > >>>>>>>>>>>>If we assign too many zpci devices to one guest, the "list pci" > >>>>>>>>>>>>instruction > >>>>>>>>>>>>will use a > >>>>>>>>>>>>resume token to get all the zpci devices. For example, first time > >>>>>>>>>>>>we return > >>>>>>>>>>>>32 zpci > >>>>>>>>>>>>devices to guest. Next time we'll return another 32 zpci devices. > >>>>>>>>>>>>The resume > >>>>>>>>>>>>token > >>>>>>>>>>>>is used to store the beginning of zpci devices that will be > >>>>>>>>>>>>returned to > >>>>>>>>>>>>guest at next time. > >>>>>>>>>>>> > >>>>>>>>>>>>So, if we change the order of the zpci device on s390 facility > >>>>>>>>>>>>bus, it may > >>>>>>>>>>>>change the > >>>>>>>>>>>>"batch" in which this device be returned to guest. But this will > >>>>>>>>>>>>not change > >>>>>>>>>>>>the pci > >>>>>>>>>>>>topology on guest. > >>>>>>>>>>>Yes but that's still guest visible, and will break > >>>>>>>>>>>for example if guest is migrated between qemu instances > >>>>>>>>>>>where list order is different precisely when > >>>>>>>>>>>it's enumerating the bus. > >>>>>>>>>>> > >>>>>>>>>>Yes, and the list order is not the only s390 specific information > >>>>>>>>>>that > >>>>>>>>>>exposed to > >>>>>>>>>>guest. Besides that, we need to migrate all other zpci > >>>>>>>>>>information. For > >>>>>>>>>>now, > >>>>>>>>>>we have no plan to support zpci migration yet. > >>>>>>>>>BTW how will hotplug work? If it happens while guest > >>>>>>>>>enumerates the bus the naturally all index values > >>>>>>>>>become invalid. > >>>>>>>>The list zpci only happen when the guest doing pci_base_init() for > >>>>>>>>s390. > >>>>>>>>At that moment, hotplug does not work yet. > >>>>>>>You can't prevent this: user can request hotplug at this time. > >>>>>>> > >>>>>>>>And assume we have > >>>>>>>>that case, we still have the index issue even when scan standard pci > >>>>>>>>bus. Please see my following words. > >>>>>>>> > >>>>>>>>>Just don't expose internal qdev data structures to guest. > >>>>>>>>>It's not by chance that we don't have a look up by index > >>>>>>>>>capability, it's an attempt to enfoce sane usage. > >>>>>>>>>You are misusing the API with your hack. > >>>>>>>>The resume token of list zpci is indeed an index of iteration:( > >>>>>>>> > >>>>>>>>>PCI has standard ways to enumerate the bus, maybe you > >>>>>>>>>should emulate it. Or find some other way that works. > >>>>>>>>>The idea to poke at s->fbus->qbus and count things there > >>>>>>>>>is a bad one. > >>>>>>>>> > >>>>>>>>I can define multiple zpci buses, and attach zpci device to a slot of > >>>>>>>>a root > >>>>>>>>bus. > >>>>>>>>Then I need to add a api to the common pci code to do the scan of all > >>>>>>>>the > >>>>>>>>pci host bridges. And in this way, it still has the index issue. I > >>>>>>>>need to > >>>>>>>>scan > >>>>>>>>from the first bus to count the index. So any suggestion from you? > >>>>>>>OK, I looked at arch/s390/pci/pci.c. > >>>>>>>First of all, it seems to run the regular PCI thing on bridges. > >>>>>>> > >>>>>>> zdev->bus = pci_scan_root_bus(NULL, ZPCI_BUS_NR, > >>>>>>> &pci_root_ops, > >>>>>>> zdev, &resources); > >>>>>>At this moment, the guest has got all the zpci devices through clp list > >>>>>>zpci > >>>>>>instruction. For each device, in the pci_scan_root_bus(), it will create > >>>>>>a root bus. So for s390, we get pci devices first, then create a new > >>>>>>root bus > >>>>>>for it. > >>>>>I don't see this in guest code. > >>>>> > >>>>>I looked at pci_scan_root_bus and it's completely generic. > >>>>>It sets up the bus: > >>>>> b = pci_create_root_bus(parent, bus, ops, sysdata, resources); > >>>>> > >>>>>then it scans it: > >>>>> max = pci_scan_child_bus(b); > >>>>> > >>>>> > >>>>>that one does > >>>>> /* Go find them, Rover! */ > >>>>> for (devfn = 0; devfn < 0x100; devfn += 8) > >>>>> pci_scan_slot(bus, devfn); > >>>>> > >>>>>next > >>>>> dev = pci_scan_single_device(bus, devfn); > >>>>> > >>>>>and so on. Eventually you get > >>>>> if (!pci_bus_read_dev_vendor_id(bus, devfn, &l, 60*1000)) > >>>>> return NULL; > >>>>> > >>>>>and that one does the clp thing using zpci_cfg_load. > >>>>> > >>>>pci_base_init()-> clp_scan_pci_devices(): > >>>> rc = clp_list_pci(rrb, __clp_add); > >>>>In this function, there is a while loop to get all the zpci devices by > >>>>means > >>>>of > >>>>resume token(index). And for each device, > >>>> __clp_add()-> clp_add_pci_device(); > >>>>In clp_add_pci_device(), we use the zpci information to create a struct > >>>>zpci_dev zdev. > >>>>Then zpci_create_device()->zpci_scan_bus()->pci_scan_root_bus() > >>>> zdev->bus = pci_scan_root_bus(NULL, ZPCI_BUS_NR, &pci_root_ops, > >>>> zdev, &resources); > >>>>So, you see, each zdev has its own root bus. And there is no child bus > >>>>under > >>>>that root bus. > >>>Right - zdev *is* the root. But there are pci devices hanging off it. > >>We have multiple zdevs in kernel, and each zdev only has one pci device > >>attached to it. > >I see. It's nasty. Is it too late to fix in guest? > >Supporting bridges should just be a question of passing > >bus numbers to host. > > On the Linux OS level, there is no pci to pci bridge on s390, the bus > number, > slot number are all virtual and has no meaning, like these: > 0000:00:00.0 > 0001:00:00.0 > 0002:00:00.0 > ...... > Each zpci device is in a separate domain. > I add Sebastian to the list, he is the owner of s390 pci. I think he > could give some reasons why s390 pci implemented in this way. > > > >I guess you need to support old guests too, so this > >justifies some code in qemu. But you still need something > >stable to sort by, that does not depend on the order > >of initialization of devices. If all else fails, ask user > >to give you numbers. > > Thank you, it's a good idea, I can sort the devices by fid or uid. > > >And I'm still confused by this: > >>>>>>>>>>>>>>>>-device s390-pcihost > >>>>>>>>>>>>>>>>-device vfio-pci,host=0000:00:00.0,id=vpci1 > >>>>>>>>>>>>>>>>-device zpci,fid=2,uid=5,pci_id=vpci1,id=zpci1 > >why isn't vfio connected to zpci? why is it the other way around? > > I implement the hotplug in s390_pci_device_hot_plug() in the patch, > not the s390_pcihost_hot_plug(). It will do some s390 specific action. > If we define zpci first, then I need to do the real hotplug when hotplugging > a vfio-pci device. I think both of them are ok, you prefer the later one?
I prefer sane modeling, it shouldn't be driven by implementation detail. But I would like to note that pci device drivers require driver handshake before device goes away. IIUC s390 hotplug is immediate, which is a problem. Maybe doing the change will help make sure device removal is acked by guest before it happens? > >>>So why not model it like this? > >>> > >>>vfio should attach to zdev, zdev is the pci host. > >>> > >>>Also, you can stick a pci to pci bridge under the root, and > >>>everything will just work. > >>> > >>> > >>> > >>> > >>> > >>>>>>>so to me, it looks like there's no need to expose > >>>>>>>non-root buses through special means. > >>>>>>> > >>>>>>>What to do for root buses is a different question but again, > >>>>>>>you definitely do not want to rely on the order of things > >>>>>>>on that linked list. > >>>>>>>The simplest thing is to ask user to give them unique > >>>>>>>numbers, or find some stable way to sort them that > >>>>>>>does not rely on order of initialization (e.g. device IDs?). > >>>>>>> > >>>>>>>But again, this only works ok for root buses. > >>>>>>> > >>>>>>Basically, it does not exposed the buses to guest, it exposed an index > >>>>>>to guest. > >>>>>>Here is the process to get all the zpci device for a guest. > >>>>>>For example: we have 10 zpci devices, and the batch size for list zpci > >>>>>>instruction is 4. > >>>>>>First, qemu will return devices 0-3, index of list zpci is 0 > >>>>>>Second, qemu will return device 4-7, index of list zpci is 4 > >>>>>>Third, qemu will return device 8-9, index of list zpci is 8 > >>>>>>We have device id, but list zpci does not use that as a flag to get > >>>>>>next batch, it use an index instead. > >>>>>>This process is defined by s390 arch, we can't change it. > >>>>>>So no matter how we organize zpci devices in qemu, slot or link list. > >>>>>>We could not get rid of the index issue. > >>>>>> > >>>>>>How about I add a flag to identify whether the link list > >>>>>>is valid or not. When a hotplug/unplug event occurred, I will > >>>>>>reset the index, and make the guest refetch the zpci devices > >>>>>>from the beginning. > >>>>>> > >>>>>> > >>>>>You should just use something stable for IDs. > >>>>>And avoid doing it for anything that isn't a root or maybe a bridge > >>>>>since it'll just cause everyone maintainance problems down the road. > >>>>> > >>>>The list zpci instruction is defined by arch, not a software thing, I > >>>>could > >>>>not > >>>>change it to use a ID instead...