On 05/23/2014 12:25 AM, Alexey Kardashevskiy wrote: > On 05/22/2014 08:57 PM, Alexander Graf wrote: >> >> On 22.05.14 12:53, Alexey Kardashevskiy wrote: >>> On 05/22/2014 05:16 PM, Alexander Graf wrote:> >>>>> Am 22.05.2014 um 08:53 schrieb Alexey Kardashevskiy <a...@ozlabs.ru>: >>>>> >>>>>> On 05/21/2014 10:42 PM, Alexey Kardashevskiy wrote: >>>>>>> On 05/21/2014 08:35 PM, Alexander Graf wrote: >>>>>>> >>>>>>>> On 21.05.14 12:13, Alexey Kardashevskiy wrote: >>>>>>>>> On 05/21/2014 07:50 PM, Alexander Graf wrote: >>>>>>>>>> On 21.05.14 11:33, Alexey Kardashevskiy wrote: >>>>>>>>>>> On 05/21/2014 07:13 PM, Alexander Graf wrote: >>>>>>>>>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote: >>>>>>>>>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote: >>>>>>>>>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote: >>>>>>>>>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote: >>>>>>>>>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote: >>>>>>>>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX >>>>>>>>>>>>>>>> interrupt as >>>>>>>>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a >>>>>>>>>>>>>>>> problem for >>>>>>>>>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver >>>>>>>>>>>>>>>> reload or >>>>>>>>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of >>>>>>>>>>>>>>>> devices >>>>>>>>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no >>>>>>>>>>>>>>>> good >>>>>>>>>>>>>>>> reason >>>>>>>>>>>>>>>> for that. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This makes use of new XICS ability to reuse interrupts. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the >>>>>>>>>>>>>>>> first >>>>>>>>>>>>>>>> IRQ >>>>>>>>>>>>>>>> number >>>>>>>>>>>>>>>> of a device is stored in MSI/MSIX config space so there is no >>>>>>>>>>>>>>>> need to >>>>>>>>>>>>>>>> store >>>>>>>>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags >>>>>>>>>>>>>>>> telling >>>>>>>>>>>>>>>> what >>>>>>>>>>>>>>>> type >>>>>>>>>>>>>>>> of interrupt for which device it has configured in order to >>>>>>>>>>>>>>>> return >>>>>>>>>>>>>>>> error if >>>>>>>>>>>>>>>> (for example) MSIX was enabled and the guest is trying to >>>>>>>>>>>>>>>> disable >>>>>>>>>>>>>>>> MSI >>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>> it has not enabled. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled >>>>>>>>>>>>>>>> devices >>>>>>>>>>>>>>>> per PHB, >>>>>>>>>>>>>>>> now XICS and PCI bus capacity are the only limitation. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This changes migration stream as it fixes >>>>>>>>>>>>>>>> vmstate_spapr_pci_msi::name >>>>>>>>>>>>>>>> which was >>>>>>>>>>>>>>>> wrong since the beginning. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This fixed traces to be more informative. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I >>>>>>>>>>>>>>>> could >>>>>>>>>>>>>>>> remove >>>>>>>>>>>>>>>> msi/msix >>>>>>>>>>>>>>>> bitmaps from this patch, would it make sense? >>>>>>>>>>>>>>> Is this a hard requirement? Does a device have to choose between >>>>>>>>>>>>>>> MSIX and >>>>>>>>>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI >>>>>>>>>>>>>>> limitation, >>>>>>>>>>>>>>> a PAPR/XICS limitation or just a limitation of your >>>>>>>>>>>>>>> implementation? >>>>>>>>>>>>>> My implementation does not have this limitation, I asked if I can >>>>>>>>>>>>>> simplify >>>>>>>>>>>>>> code by introducing one :) >>>>>>>>>>>>>> >>>>>>>>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX >>>>>>>>>>>>>> enabled >>>>>>>>>>>>>> but >>>>>>>>>>>>>> it does not seem to be used by anyone => cannot debug and >>>>>>>>>>>>>> confirm. >>>>>>>>>>>>>> >>>>>>>>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when >>>>>>>>>>>>>> MSI is >>>>>>>>>>>>>> already >>>>>>>>>>>>>> enabled, this is a "change", not enabling both types. But it also >>>>>>>>>>>>>> says MSI >>>>>>>>>>>>>> and MSIX vector numbers are not shared. Hm. >>>>>>>>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd >>>>>>>>>>>>> rather not impose one. >>>>>>>>>>>>> >>>>>>>>>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at >>>>>>>>>>>>> the same time? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Alex >>>>>>>>>>>> No, and the PCI spec says: >>>>>>>>>>>> A function is permitted to implement both MSI and MSI-X, but >>>>>>>>>>>> system >>>>>>>>>>>> software is >>>>>>>>>>>> prohibited from enabling both at the same time. If system >>>>>>>>>>>> software >>>>>>>>>>>> enables both at the same time, the result is undefined. >>>>>>>>>>> Ah, cool. So yes Alexey, feel free to impose it :). >>>>>>>>>> Heh. This solves just half of the problem - I still have to keep >>>>>>>>>> track of >>>>>>>>>> what device got MSI/MSIX configured via that ibm,change-msi >>>>>>>>>> interface. I >>>>>>>>>> was hoping I can store such flag somewhere in a device PCI config >>>>>>>>>> space >>>>>>>>>> but >>>>>>>>>> MSI/MSIX enable bit is not good as it is not set when those calls are >>>>>>>>>> made. >>>>>>>>>> And I cannot rely on address/data fields much as the guest can >>>>>>>>>> change them >>>>>>>>>> (I already use them to store IRQ numbers and btw it is missing >>>>>>>>>> checks when >>>>>>>>>> I read them back for disposal, I'll fix in next round). >>>>>>>>>> >>>>>>>>>> Or on "enable" event I could put IRQ numbers to .data of MSI >>>>>>>>>> config space >>>>>>>>>> and on "disable" check if it is not zero, then configuration took >>>>>>>>>> place, >>>>>>>>>> then I can remove my msi[]/msix[] flag arrays. If the guest did >>>>>>>>>> any change >>>>>>>>>> to MSI/MSIX config space (it does not on SPAPR except weird selftest >>>>>>>>>> cases), I compare .data with what ICS can possibly have and either >>>>>>>>>> reject >>>>>>>>>> "disable" or handle it and if it breaks XICS - that's too bad for the >>>>>>>>>> stupid guest. Would that be acceptable? >>>>>>>>> Can't you prohibit the guest from writing to the MSI configuration >>>>>>>>> registers itself? Then you don't need to do sanity checks. >>>>>>>> I could for emulated devices but VFIO uses the same code. For example, >>>>>>>> there is an IBM SCSI IPR card which does a "self test". For that, it >>>>>>>> saves >>>>>>>> MSIX BAR content, does reboot via some backdoor interface and >>>>>>>> restores MSIX >>>>>>>> BAR. It has been solved for VFIO in the host kernel by restoring >>>>>>>> MSIX data >>>>>>>> from cached values when guest is trying to restore it with what it >>>>>>>> thinks >>>>>>>> is actual MSIX data (it is virtualized because of x86). But there is >>>>>>>> cache >>>>>>> We already have a cache because we don't access the real PCI >>>>>>> registers with >>>>>>> msi_set_message(), no? >>>>>> >>>>>> For emulated devices there is no cache. And in any case the guest is >>>>>> allowed to write to it... Who knows what AIX does? I do not. >>>>> >>>>> Tried GHashTable for keeping bus:dev.fn <-> (irq, num), more or less ok >>>>> but >>>>> how to migrate such thing? Temporary cache somewhere and then unpack >>>>> it? Or >>>>> use old style migration callbacks? >>>> Could you try to introduce a new vmstate type that just serializes and >>>> deserializes hash tables? Maybe there is already a serialization >>>> function for it in glib? >>> I have not found any (most likely I do not know how to search there), >>> I added mine, here are VMSTATE_HASH + its use for SPAPR. >>> >>> >>> Is this a movement to right direction? I need to put key/value sizes >>> into VMSTATE definition somehow but do not really want to touch >>> VMStateField. >> >> I'm not sure. Juan? > > > I also tried to simplify this particular thing more by assuming that the > key is always "int" and put size of value to VMStateField::size. But there > is no way to get the size in VMStateInfo::get(). Or I could do a > "subsection" and try implementing has as an array (and have get/put defined > for items, should work) but in this case I'll need to cache number of > elements of the hash table somewhere so I'll need a wrapper around GHashTable. > > Making generalized version is not obvious for me :(
Juan? Alex? Anybody? Ping. How do I migrate GHashTable? If I am allowed to use custom and bit more polished get/put from "[PATCH 1/2] vmstate: Add helper to enable GHashTable migration", I'll be fine. Thanks! -- Alexey