> Am 22.05.2014 um 08:53 schrieb Alexey Kardashevskiy <a...@ozlabs.ru>: > >> On 05/21/2014 10:42 PM, Alexey Kardashevskiy wrote: >>> On 05/21/2014 08:35 PM, Alexander Graf wrote: >>> >>>> On 21.05.14 12:13, Alexey Kardashevskiy wrote: >>>>> On 05/21/2014 07:50 PM, Alexander Graf wrote: >>>>>> On 21.05.14 11:33, Alexey Kardashevskiy wrote: >>>>>>> On 05/21/2014 07:13 PM, Alexander Graf wrote: >>>>>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote: >>>>>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote: >>>>>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote: >>>>>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote: >>>>>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote: >>>>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX >>>>>>>>>>>> interrupt as >>>>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a >>>>>>>>>>>> problem for >>>>>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver >>>>>>>>>>>> reload or >>>>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of >>>>>>>>>>>> devices >>>>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good >>>>>>>>>>>> reason >>>>>>>>>>>> for that. >>>>>>>>>>>> >>>>>>>>>>>> This makes use of new XICS ability to reuse interrupts. >>>>>>>>>>>> >>>>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the first >>>>>>>>>>>> IRQ >>>>>>>>>>>> number >>>>>>>>>>>> of a device is stored in MSI/MSIX config space so there is no >>>>>>>>>>>> need to >>>>>>>>>>>> store >>>>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags telling >>>>>>>>>>>> what >>>>>>>>>>>> type >>>>>>>>>>>> of interrupt for which device it has configured in order to return >>>>>>>>>>>> error if >>>>>>>>>>>> (for example) MSIX was enabled and the guest is trying to disable >>>>>>>>>>>> MSI >>>>>>>>>>>> which >>>>>>>>>>>> it has not enabled. >>>>>>>>>>>> >>>>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled devices >>>>>>>>>>>> per PHB, >>>>>>>>>>>> now XICS and PCI bus capacity are the only limitation. >>>>>>>>>>>> >>>>>>>>>>>> This changes migration stream as it fixes >>>>>>>>>>>> vmstate_spapr_pci_msi::name >>>>>>>>>>>> which was >>>>>>>>>>>> wrong since the beginning. >>>>>>>>>>>> >>>>>>>>>>>> This fixed traces to be more informative. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >>>>>>>>>>>> --- >>>>>>>>>>>> >>>>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I could >>>>>>>>>>>> remove >>>>>>>>>>>> msi/msix >>>>>>>>>>>> bitmaps from this patch, would it make sense? >>>>>>>>>>> Is this a hard requirement? Does a device have to choose between >>>>>>>>>>> MSIX and >>>>>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI >>>>>>>>>>> limitation, >>>>>>>>>>> a PAPR/XICS limitation or just a limitation of your implementation? >>>>>>>>>> My implementation does not have this limitation, I asked if I can >>>>>>>>>> simplify >>>>>>>>>> code by introducing one :) >>>>>>>>>> >>>>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX enabled >>>>>>>>>> but >>>>>>>>>> it does not seem to be used by anyone => cannot debug and confirm. >>>>>>>>>> >>>>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when MSI is >>>>>>>>>> already >>>>>>>>>> enabled, this is a "change", not enabling both types. But it also >>>>>>>>>> says MSI >>>>>>>>>> and MSIX vector numbers are not shared. Hm. >>>>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd >>>>>>>>> rather not impose one. >>>>>>>>> >>>>>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at >>>>>>>>> the same time? >>>>>>>>> >>>>>>>>> >>>>>>>>> Alex >>>>>>>> No, and the PCI spec says: >>>>>>>> A function is permitted to implement both MSI and MSI-X, but >>>>>>>> system >>>>>>>> software is >>>>>>>> prohibited from enabling both at the same time. If system software >>>>>>>> enables both at the same time, the result is undefined. >>>>>>> Ah, cool. So yes Alexey, feel free to impose it :). >>>>>> Heh. This solves just half of the problem - I still have to keep track of >>>>>> what device got MSI/MSIX configured via that ibm,change-msi interface. I >>>>>> was hoping I can store such flag somewhere in a device PCI config space >>>>>> but >>>>>> MSI/MSIX enable bit is not good as it is not set when those calls are >>>>>> made. >>>>>> And I cannot rely on address/data fields much as the guest can change >>>>>> them >>>>>> (I already use them to store IRQ numbers and btw it is missing checks >>>>>> when >>>>>> I read them back for disposal, I'll fix in next round). >>>>>> >>>>>> Or on "enable" event I could put IRQ numbers to .data of MSI config space >>>>>> and on "disable" check if it is not zero, then configuration took place, >>>>>> then I can remove my msi[]/msix[] flag arrays. If the guest did any >>>>>> change >>>>>> to MSI/MSIX config space (it does not on SPAPR except weird selftest >>>>>> cases), I compare .data with what ICS can possibly have and either reject >>>>>> "disable" or handle it and if it breaks XICS - that's too bad for the >>>>>> stupid guest. Would that be acceptable? >>>>> Can't you prohibit the guest from writing to the MSI configuration >>>>> registers itself? Then you don't need to do sanity checks. >>>> >>>> I could for emulated devices but VFIO uses the same code. For example, >>>> there is an IBM SCSI IPR card which does a "self test". For that, it saves >>>> MSIX BAR content, does reboot via some backdoor interface and restores MSIX >>>> BAR. It has been solved for VFIO in the host kernel by restoring MSIX data >>>> from cached values when guest is trying to restore it with what it thinks >>>> is actual MSIX data (it is virtualized because of x86). But there is cache >>> >>> We already have a cache because we don't access the real PCI registers with >>> msi_set_message(), no? >> >> >> For emulated devices there is no cache. And in any case the guest is >> allowed to write to it... Who knows what AIX does? I do not. > > > Tried GHashTable for keeping bus:dev.fn <-> (irq, num), more or less ok but > how to migrate such thing? Temporary cache somewhere and then unpack it? Or > use old style migration callbacks?
Could you try to introduce a new vmstate type that just serializes and deserializes hash tables? Maybe there is already a serialization function for it in glib? Alex