On 05/22/2014 05:16 PM, Alexander Graf wrote:> > >> Am 22.05.2014 um 08:53 schrieb Alexey Kardashevskiy <a...@ozlabs.ru>: >> >>> On 05/21/2014 10:42 PM, Alexey Kardashevskiy wrote: >>>> On 05/21/2014 08:35 PM, Alexander Graf wrote: >>>> >>>>> On 21.05.14 12:13, Alexey Kardashevskiy wrote: >>>>>> On 05/21/2014 07:50 PM, Alexander Graf wrote: >>>>>>> On 21.05.14 11:33, Alexey Kardashevskiy wrote: >>>>>>>> On 05/21/2014 07:13 PM, Alexander Graf wrote: >>>>>>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote: >>>>>>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote: >>>>>>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote: >>>>>>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote: >>>>>>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote: >>>>>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX >>>>>>>>>>>>> interrupt as >>>>>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a >>>>>>>>>>>>> problem for >>>>>>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver >>>>>>>>>>>>> reload or >>>>>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of >>>>>>>>>>>>> devices >>>>>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good >>>>>>>>>>>>> reason >>>>>>>>>>>>> for that. >>>>>>>>>>>>> >>>>>>>>>>>>> This makes use of new XICS ability to reuse interrupts. >>>>>>>>>>>>> >>>>>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the first >>>>>>>>>>>>> IRQ >>>>>>>>>>>>> number >>>>>>>>>>>>> of a device is stored in MSI/MSIX config space so there is no >>>>>>>>>>>>> need to >>>>>>>>>>>>> store >>>>>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags >>>>>>>>>>>>> telling >>>>>>>>>>>>> what >>>>>>>>>>>>> type >>>>>>>>>>>>> of interrupt for which device it has configured in order to return >>>>>>>>>>>>> error if >>>>>>>>>>>>> (for example) MSIX was enabled and the guest is trying to disable >>>>>>>>>>>>> MSI >>>>>>>>>>>>> which >>>>>>>>>>>>> it has not enabled. >>>>>>>>>>>>> >>>>>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled >>>>>>>>>>>>> devices >>>>>>>>>>>>> per PHB, >>>>>>>>>>>>> now XICS and PCI bus capacity are the only limitation. >>>>>>>>>>>>> >>>>>>>>>>>>> This changes migration stream as it fixes >>>>>>>>>>>>> vmstate_spapr_pci_msi::name >>>>>>>>>>>>> which was >>>>>>>>>>>>> wrong since the beginning. >>>>>>>>>>>>> >>>>>>>>>>>>> This fixed traces to be more informative. >>>>>>>>>>>>> >>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >>>>>>>>>>>>> --- >>>>>>>>>>>>> >>>>>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I could >>>>>>>>>>>>> remove >>>>>>>>>>>>> msi/msix >>>>>>>>>>>>> bitmaps from this patch, would it make sense? >>>>>>>>>>>> Is this a hard requirement? Does a device have to choose between >>>>>>>>>>>> MSIX and >>>>>>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI >>>>>>>>>>>> limitation, >>>>>>>>>>>> a PAPR/XICS limitation or just a limitation of your implementation? >>>>>>>>>>> My implementation does not have this limitation, I asked if I can >>>>>>>>>>> simplify >>>>>>>>>>> code by introducing one :) >>>>>>>>>>> >>>>>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX >>>>>>>>>>> enabled >>>>>>>>>>> but >>>>>>>>>>> it does not seem to be used by anyone => cannot debug and confirm. >>>>>>>>>>> >>>>>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when MSI is >>>>>>>>>>> already >>>>>>>>>>> enabled, this is a "change", not enabling both types. But it also >>>>>>>>>>> says MSI >>>>>>>>>>> and MSIX vector numbers are not shared. Hm. >>>>>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd >>>>>>>>>> rather not impose one. >>>>>>>>>> >>>>>>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at >>>>>>>>>> the same time? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Alex >>>>>>>>> No, and the PCI spec says: >>>>>>>>> A function is permitted to implement both MSI and MSI-X, but >>>>>>>>> system >>>>>>>>> software is >>>>>>>>> prohibited from enabling both at the same time. If system >>>>>>>>> software >>>>>>>>> enables both at the same time, the result is undefined. >>>>>>>> Ah, cool. So yes Alexey, feel free to impose it :). >>>>>>> Heh. This solves just half of the problem - I still have to keep track >>>>>>> of >>>>>>> what device got MSI/MSIX configured via that ibm,change-msi interface. I >>>>>>> was hoping I can store such flag somewhere in a device PCI config space >>>>>>> but >>>>>>> MSI/MSIX enable bit is not good as it is not set when those calls are >>>>>>> made. >>>>>>> And I cannot rely on address/data fields much as the guest can change >>>>>>> them >>>>>>> (I already use them to store IRQ numbers and btw it is missing checks >>>>>>> when >>>>>>> I read them back for disposal, I'll fix in next round). >>>>>>> >>>>>>> Or on "enable" event I could put IRQ numbers to .data of MSI config >>>>>>> space >>>>>>> and on "disable" check if it is not zero, then configuration took place, >>>>>>> then I can remove my msi[]/msix[] flag arrays. If the guest did any >>>>>>> change >>>>>>> to MSI/MSIX config space (it does not on SPAPR except weird selftest >>>>>>> cases), I compare .data with what ICS can possibly have and either >>>>>>> reject >>>>>>> "disable" or handle it and if it breaks XICS - that's too bad for the >>>>>>> stupid guest. Would that be acceptable? >>>>>> Can't you prohibit the guest from writing to the MSI configuration >>>>>> registers itself? Then you don't need to do sanity checks. >>>>> >>>>> I could for emulated devices but VFIO uses the same code. For example, >>>>> there is an IBM SCSI IPR card which does a "self test". For that, it saves >>>>> MSIX BAR content, does reboot via some backdoor interface and restores >>>>> MSIX >>>>> BAR. It has been solved for VFIO in the host kernel by restoring MSIX data >>>>> from cached values when guest is trying to restore it with what it thinks >>>>> is actual MSIX data (it is virtualized because of x86). But there is cache >>>> >>>> We already have a cache because we don't access the real PCI registers with >>>> msi_set_message(), no? >>> >>> >>> For emulated devices there is no cache. And in any case the guest is >>> allowed to write to it... Who knows what AIX does? I do not. >> >> >> Tried GHashTable for keeping bus:dev.fn <-> (irq, num), more or less ok but >> how to migrate such thing? Temporary cache somewhere and then unpack it? Or >> use old style migration callbacks? >
> Could you try to introduce a new vmstate type that just serializes and > deserializes hash tables? Maybe there is already a serialization > function for it in glib? I have not found any (most likely I do not know how to search there), I added mine, here are VMSTATE_HASH + its use for SPAPR. Is this a movement to right direction? I need to put key/value sizes into VMSTATE definition somehow but do not really want to touch VMStateField. Alexey Kardashevskiy (2): vmstate: Add helper to enable GHashTable migration spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB hw/ppc/spapr_pci.c | 144 +++++++++++++++++--------------------------- include/hw/pci-host/spapr.h | 13 ++-- include/migration/vmstate.h | 10 +++ include/qemu-common.h | 13 ++++ trace-events | 5 +- vmstate.c | 54 +++++++++++++++++ 6 files changed, 141 insertions(+), 98 deletions(-) -- 1.9.rc0