On 04/12/2014 02:15 AM, Alexander Graf wrote: > > On 11.04.14 18:01, Alexey Kardashevskiy wrote: >> On 04/12/2014 01:38 AM, Alexander Graf wrote: >>> On 11.04.14 17:27, Alexey Kardashevskiy wrote: >>>> On 04/12/2014 12:58 AM, Alexander Graf wrote: >>>>> On 11.04.14 16:50, Alexey Kardashevskiy wrote: >>>>>> On 04/11/2014 11:58 PM, Alexander Graf wrote: >>>>>>> On 11.04.2014, at 14:38, Alexey Kardashevskiy <a...@ozlabs.ru> wrote: >>>>>>> >>>>>>>> On 04/11/2014 07:24 PM, Alexander Graf wrote: >>>>>>>>> On 10.04.14 16:43, Alexey Kardashevskiy wrote: >>>>>>>>>> On 04/10/2014 11:26 PM, Alexander Graf wrote: >>>>>>>>>>> On 10.04.14 15:24, Alexey Kardashevskiy wrote: >>>>>>>>>>>> On 04/10/2014 10:51 PM, Alexander Graf wrote: >>>>>>>>>>>>> On 14.03.14 05:18, Alexey Kardashevskiy wrote: >>>>>>>>>>>>>> The current allocator returns IRQ numbers from a pool and >>>>>>>>>>>>>> does not >>>>>>>>>>>>>> support IRQs reuse in any form as it did not keep track of >>>>>>>>>>>>>> what it >>>>>>>>>>>>>> previously returned, it only had the last returned IRQ. >>>>>>>>>>>>>> However migration may change interrupts for devices depending on >>>>>>>>>>>>>> their order in the command line. >>>>>>>>>>>>> Wtf? Nonono, this sounds very bogus and wrong. Migration >>>>>>>>>>>>> shouldn't >>>>>>>>>>>>> change >>>>>>>>>>>>> anything. >>>>>>>>>>>> I put wrong commit message. By change I meant that the default >>>>>>>>>>>> state >>>>>>>>>>>> before >>>>>>>>>>>> the destination guest started accepting migration is different >>>>>>>>>>>> from >>>>>>>>>>>> what >>>>>>>>>>>> the destination guest became after migration finished. And >>>>>>>>>>>> migration >>>>>>>>>>>> cannot >>>>>>>>>>>> avoid changing this default state. >>>>>>>>>>> Ok, why is the IRQ configuration different? >>>>>>>>>> Because QEMU creates devices in the order as in the command line, >>>>>>>>>> and >>>>>>>>>> libvirt changes this order - the XML used to create the guest and >>>>>>>>>> the >>>>>>>>>> XML >>>>>>>>>> which is sends during migration are different. libvirt thinks it >>>>>>>>>> is ok >>>>>>>>>> while it keeps @reg property for (for example) spapr-vscsi devices >>>>>>>>>> but it >>>>>>>>>> is not because since the order is different, devices call IRQ >>>>>>>>>> allocator in >>>>>>>>>> different order and get different IRQs. >>>>>>>>> So your patch migrates the current IRQ configuration, but once you >>>>>>>>> restart >>>>>>>>> the virtual machine on the destination host it will have different >>>>>>>>> IRQ >>>>>>>>> numbering again, right? >>>>>>>> No, why? IRQs are assigned at init time from realize() callbacks (and >>>>>>>> survive reset) or as a part of ibm,change-msi rtas call which >>>>>>>> happens in >>>>>>>> the same order as it only depends on pci addresses and we do not >>>>>>>> change >>>>>>>> this either. >>>>>>> Ok, let me rephrase. If I shut the machine down because I'm doing >>>>>>> on-disk hibernate and then boot it back up, will the guest find the >>>>>>> same >>>>>>> configuration? >>>>>> I do not understand what you mean by this. Hibernation by the guest OS >>>>>> itself or by QEMU? If this involves QEMU exit and QEMU start - then yes, >>>>> by the guest OS. The host will only see a genuine "shutdown" event. The >>>>> guest OS will expect the machine to look *the exact same* as before the >>>>> shutdown. >>>> Ok. So. I have to implement "irq" property everywhere (PHB is missing >>>> INTA/B/C/D now) and check if they did not change during migration via >>>> those >>> Hrm. Not sure. Maybe it'd make sense to join next week's call on platform >>> device creation. The problem seems pretty closely related. >> What are those platform devices and what are you going to discuss exactly? > > Devices that don't have a unified interrupt routing scheme like PCI where > you just link lines A/B/C/D to your controller and you're good to go.
Ah. VIO in my case. >>>> VMSTATE.*EQUAL. Correct? >>> Why would you need this? I think we already said a couple dozen times that >>> configuration matching is a bigger problem, no? >> For debug! It is not needed in general, yes. >> >> >>>> If so (more or less), I still would like to keep patches 1..7. >>>> In fact, the first one is independent and we need it anyway. >>>> Yes/no? >>> Why? >> IOMMUs do not migrate correctly - they only have a class have and >> instance_id and this instance_it depends on command line arguments order. >> The #1 patch makes it classname + liobn. > > Why do we need a bus for that? For BusClass::get_dev_path callback to get an unique name. >>>>>> config may be different. If it is "migrate to file" and then "migrate >>>>>> from >>>>>> file" (do not know what you call it when migration goes to a pipe >>>>>> which is >>>>>> "tar") - then config will be the same. >>>>>> >>>>>> >>>>>>>>> I'm not sure that's a good solution to the problem. I guess we should >>>>>>>>> rather aim to make sure that we can make IRQ allocation explicit. >>>>>>>>> Fundamentally the problem sounds very similar to the PCI slot >>>>>>>>> allocation >>>>>>>>> which eventually got solved by libvirt specifying the slots manually. >>>>>>>> We can do that too. Who decides? :) >>>>>>> The better solution wins :) >>>>>> We both know who decides ;) I posted series, I need heads up if it is >>>>>> going >>>>>> the right way or not. >>>>> It's not :). If a guest may not have different IRQ allocation after >>>>> migration, it also must not have different IRQ allocation after >>>>> shutdown + >>>>> restart. >>>> Ok. That's good answer, thanks. How does x86 work then? IRQs are hardcoded >>>> (some are for sure but I do not know about MSI)? Or in order to support >>> Non-PCI IRQs are hardcoded, yes. PCI IRQs are mapped to one of the 4 PCI >>> interrupts which again are hardcoded to IOAPIC interrupt lines after some >>> PCI line swizzling. >> This is what I meant - I need to have a way to tell PHB IRQ numbers for >> INTA/B/C/D. > > Yes, just like platform devices ;). -- Alexey