On 05/22/2014 08:57 PM, Alexander Graf wrote: > > On 22.05.14 12:53, Alexey Kardashevskiy wrote: >> On 05/22/2014 05:16 PM, Alexander Graf wrote:> >>>> Am 22.05.2014 um 08:53 schrieb Alexey Kardashevskiy <a...@ozlabs.ru>: >>>> >>>>> On 05/21/2014 10:42 PM, Alexey Kardashevskiy wrote: >>>>>> On 05/21/2014 08:35 PM, Alexander Graf wrote: >>>>>> >>>>>>> On 21.05.14 12:13, Alexey Kardashevskiy wrote: >>>>>>>> On 05/21/2014 07:50 PM, Alexander Graf wrote: >>>>>>>>> On 21.05.14 11:33, Alexey Kardashevskiy wrote: >>>>>>>>>> On 05/21/2014 07:13 PM, Alexander Graf wrote: >>>>>>>>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote: >>>>>>>>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote: >>>>>>>>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote: >>>>>>>>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote: >>>>>>>>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote: >>>>>>>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX >>>>>>>>>>>>>>> interrupt as >>>>>>>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a >>>>>>>>>>>>>>> problem for >>>>>>>>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver >>>>>>>>>>>>>>> reload or >>>>>>>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of >>>>>>>>>>>>>>> devices >>>>>>>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no >>>>>>>>>>>>>>> good >>>>>>>>>>>>>>> reason >>>>>>>>>>>>>>> for that. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This makes use of new XICS ability to reuse interrupts. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the >>>>>>>>>>>>>>> first >>>>>>>>>>>>>>> IRQ >>>>>>>>>>>>>>> number >>>>>>>>>>>>>>> of a device is stored in MSI/MSIX config space so there is no >>>>>>>>>>>>>>> need to >>>>>>>>>>>>>>> store >>>>>>>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags >>>>>>>>>>>>>>> telling >>>>>>>>>>>>>>> what >>>>>>>>>>>>>>> type >>>>>>>>>>>>>>> of interrupt for which device it has configured in order to >>>>>>>>>>>>>>> return >>>>>>>>>>>>>>> error if >>>>>>>>>>>>>>> (for example) MSIX was enabled and the guest is trying to >>>>>>>>>>>>>>> disable >>>>>>>>>>>>>>> MSI >>>>>>>>>>>>>>> which >>>>>>>>>>>>>>> it has not enabled. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled >>>>>>>>>>>>>>> devices >>>>>>>>>>>>>>> per PHB, >>>>>>>>>>>>>>> now XICS and PCI bus capacity are the only limitation. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This changes migration stream as it fixes >>>>>>>>>>>>>>> vmstate_spapr_pci_msi::name >>>>>>>>>>>>>>> which was >>>>>>>>>>>>>>> wrong since the beginning. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This fixed traces to be more informative. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I >>>>>>>>>>>>>>> could >>>>>>>>>>>>>>> remove >>>>>>>>>>>>>>> msi/msix >>>>>>>>>>>>>>> bitmaps from this patch, would it make sense? >>>>>>>>>>>>>> Is this a hard requirement? Does a device have to choose between >>>>>>>>>>>>>> MSIX and >>>>>>>>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI >>>>>>>>>>>>>> limitation, >>>>>>>>>>>>>> a PAPR/XICS limitation or just a limitation of your >>>>>>>>>>>>>> implementation? >>>>>>>>>>>>> My implementation does not have this limitation, I asked if I can >>>>>>>>>>>>> simplify >>>>>>>>>>>>> code by introducing one :) >>>>>>>>>>>>> >>>>>>>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX >>>>>>>>>>>>> enabled >>>>>>>>>>>>> but >>>>>>>>>>>>> it does not seem to be used by anyone => cannot debug and >>>>>>>>>>>>> confirm. >>>>>>>>>>>>> >>>>>>>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when >>>>>>>>>>>>> MSI is >>>>>>>>>>>>> already >>>>>>>>>>>>> enabled, this is a "change", not enabling both types. But it also >>>>>>>>>>>>> says MSI >>>>>>>>>>>>> and MSIX vector numbers are not shared. Hm. >>>>>>>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd >>>>>>>>>>>> rather not impose one. >>>>>>>>>>>> >>>>>>>>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at >>>>>>>>>>>> the same time? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Alex >>>>>>>>>>> No, and the PCI spec says: >>>>>>>>>>> A function is permitted to implement both MSI and MSI-X, but >>>>>>>>>>> system >>>>>>>>>>> software is >>>>>>>>>>> prohibited from enabling both at the same time. If system >>>>>>>>>>> software >>>>>>>>>>> enables both at the same time, the result is undefined. >>>>>>>>>> Ah, cool. So yes Alexey, feel free to impose it :). >>>>>>>>> Heh. This solves just half of the problem - I still have to keep >>>>>>>>> track of >>>>>>>>> what device got MSI/MSIX configured via that ibm,change-msi >>>>>>>>> interface. I >>>>>>>>> was hoping I can store such flag somewhere in a device PCI config >>>>>>>>> space >>>>>>>>> but >>>>>>>>> MSI/MSIX enable bit is not good as it is not set when those calls are >>>>>>>>> made. >>>>>>>>> And I cannot rely on address/data fields much as the guest can >>>>>>>>> change them >>>>>>>>> (I already use them to store IRQ numbers and btw it is missing >>>>>>>>> checks when >>>>>>>>> I read them back for disposal, I'll fix in next round). >>>>>>>>> >>>>>>>>> Or on "enable" event I could put IRQ numbers to .data of MSI >>>>>>>>> config space >>>>>>>>> and on "disable" check if it is not zero, then configuration took >>>>>>>>> place, >>>>>>>>> then I can remove my msi[]/msix[] flag arrays. If the guest did >>>>>>>>> any change >>>>>>>>> to MSI/MSIX config space (it does not on SPAPR except weird selftest >>>>>>>>> cases), I compare .data with what ICS can possibly have and either >>>>>>>>> reject >>>>>>>>> "disable" or handle it and if it breaks XICS - that's too bad for the >>>>>>>>> stupid guest. Would that be acceptable? >>>>>>>> Can't you prohibit the guest from writing to the MSI configuration >>>>>>>> registers itself? Then you don't need to do sanity checks. >>>>>>> I could for emulated devices but VFIO uses the same code. For example, >>>>>>> there is an IBM SCSI IPR card which does a "self test". For that, it >>>>>>> saves >>>>>>> MSIX BAR content, does reboot via some backdoor interface and >>>>>>> restores MSIX >>>>>>> BAR. It has been solved for VFIO in the host kernel by restoring >>>>>>> MSIX data >>>>>>> from cached values when guest is trying to restore it with what it >>>>>>> thinks >>>>>>> is actual MSIX data (it is virtualized because of x86). But there is >>>>>>> cache >>>>>> We already have a cache because we don't access the real PCI >>>>>> registers with >>>>>> msi_set_message(), no? >>>>> >>>>> For emulated devices there is no cache. And in any case the guest is >>>>> allowed to write to it... Who knows what AIX does? I do not. >>>> >>>> Tried GHashTable for keeping bus:dev.fn <-> (irq, num), more or less ok >>>> but >>>> how to migrate such thing? Temporary cache somewhere and then unpack >>>> it? Or >>>> use old style migration callbacks? >>> Could you try to introduce a new vmstate type that just serializes and >>> deserializes hash tables? Maybe there is already a serialization >>> function for it in glib? >> I have not found any (most likely I do not know how to search there), >> I added mine, here are VMSTATE_HASH + its use for SPAPR. >> >> >> Is this a movement to right direction? I need to put key/value sizes >> into VMSTATE definition somehow but do not really want to touch >> VMStateField. > > I'm not sure. Juan?
I also tried to simplify this particular thing more by assuming that the key is always "int" and put size of value to VMStateField::size. But there is no way to get the size in VMStateInfo::get(). Or I could do a "subsection" and try implementing has as an array (and have get/put defined for items, should work) but in this case I'll need to cache number of elements of the hash table somewhere so I'll need a wrapper around GHashTable. Making generalized version is not obvious for me :( -- Alexey