On 04/14/2014 05:42 PM, Alexander Graf wrote: > > On 14.04.14 09:41, Alexey Kardashevskiy wrote: >> On 04/14/2014 05:34 PM, Alexander Graf wrote: >>> On 11.04.14 18:30, Alexey Kardashevskiy wrote: >>>> On 04/12/2014 02:15 AM, Alexander Graf wrote: >>>>> On 11.04.14 18:01, Alexey Kardashevskiy wrote: >>>>>> On 04/12/2014 01:38 AM, Alexander Graf wrote: >>>>>>> On 11.04.14 17:27, Alexey Kardashevskiy wrote: >>>>>>>> On 04/12/2014 12:58 AM, Alexander Graf wrote: >>>>>>>>> On 11.04.14 16:50, Alexey Kardashevskiy wrote: >>>>>>>>>> On 04/11/2014 11:58 PM, Alexander Graf wrote: >>>>>>>>>>> On 11.04.2014, at 14:38, Alexey Kardashevskiy <a...@ozlabs.ru> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> On 04/11/2014 07:24 PM, Alexander Graf wrote: >>>>>>>>>>>>> On 10.04.14 16:43, Alexey Kardashevskiy wrote: >>>>>>>>>>>>>> On 04/10/2014 11:26 PM, Alexander Graf wrote: >>>>>>>>>>>>>>> On 10.04.14 15:24, Alexey Kardashevskiy wrote: >>>>>>>>>>>>>>>> On 04/10/2014 10:51 PM, Alexander Graf wrote: >>>>>>>>>>>>>>>>> On 14.03.14 05:18, Alexey Kardashevskiy wrote: >>>>>>>>>>>>>>>>>> The current allocator returns IRQ numbers from a pool and >>>>>>>>>>>>>>>>>> does not >>>>>>>>>>>>>>>>>> support IRQs reuse in any form as it did not keep track of >>>>>>>>>>>>>>>>>> what it >>>>>>>>>>>>>>>>>> previously returned, it only had the last returned IRQ. >>>>>>>>>>>>>>>>>> However migration may change interrupts for devices >>>>>>>>>>>>>>>>>> depending on >>>>>>>>>>>>>>>>>> their order in the command line. >>>>>>>>>>>>>>>>> Wtf? Nonono, this sounds very bogus and wrong. Migration >>>>>>>>>>>>>>>>> shouldn't >>>>>>>>>>>>>>>>> change >>>>>>>>>>>>>>>>> anything. >>>>>>>>>>>>>>>> I put wrong commit message. By change I meant that the default >>>>>>>>>>>>>>>> state >>>>>>>>>>>>>>>> before >>>>>>>>>>>>>>>> the destination guest started accepting migration is different >>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>> what >>>>>>>>>>>>>>>> the destination guest became after migration finished. And >>>>>>>>>>>>>>>> migration >>>>>>>>>>>>>>>> cannot >>>>>>>>>>>>>>>> avoid changing this default state. >>>>>>>>>>>>>>> Ok, why is the IRQ configuration different? >>>>>>>>>>>>>> Because QEMU creates devices in the order as in the command >>>>>>>>>>>>>> line, >>>>>>>>>>>>>> and >>>>>>>>>>>>>> libvirt changes this order - the XML used to create the guest >>>>>>>>>>>>>> and >>>>>>>>>>>>>> the >>>>>>>>>>>>>> XML >>>>>>>>>>>>>> which is sends during migration are different. libvirt thinks it >>>>>>>>>>>>>> is ok >>>>>>>>>>>>>> while it keeps @reg property for (for example) spapr-vscsi >>>>>>>>>>>>>> devices >>>>>>>>>>>>>> but it >>>>>>>>>>>>>> is not because since the order is different, devices call IRQ >>>>>>>>>>>>>> allocator in >>>>>>>>>>>>>> different order and get different IRQs. >>>>>>>>>>>>> So your patch migrates the current IRQ configuration, but once >>>>>>>>>>>>> you >>>>>>>>>>>>> restart >>>>>>>>>>>>> the virtual machine on the destination host it will have >>>>>>>>>>>>> different >>>>>>>>>>>>> IRQ >>>>>>>>>>>>> numbering again, right? >>>>>>>>>>>> No, why? IRQs are assigned at init time from realize() callbacks >>>>>>>>>>>> (and >>>>>>>>>>>> survive reset) or as a part of ibm,change-msi rtas call which >>>>>>>>>>>> happens in >>>>>>>>>>>> the same order as it only depends on pci addresses and we do not >>>>>>>>>>>> change >>>>>>>>>>>> this either. >>>>>>>>>>> Ok, let me rephrase. If I shut the machine down because I'm doing >>>>>>>>>>> on-disk hibernate and then boot it back up, will the guest find the >>>>>>>>>>> same >>>>>>>>>>> configuration? >>>>>>>>>> I do not understand what you mean by this. Hibernation by the >>>>>>>>>> guest OS >>>>>>>>>> itself or by QEMU? If this involves QEMU exit and QEMU start - then >>>>>>>>>> yes, >>>>>>>>> by the guest OS. The host will only see a genuine "shutdown" >>>>>>>>> event. The >>>>>>>>> guest OS will expect the machine to look *the exact same* as >>>>>>>>> before the >>>>>>>>> shutdown. >>>>>>>> Ok. So. I have to implement "irq" property everywhere (PHB is missing >>>>>>>> INTA/B/C/D now) and check if they did not change during migration via >>>>>>>> those >>>>>>> Hrm. Not sure. Maybe it'd make sense to join next week's call on >>>>>>> platform >>>>>>> device creation. The problem seems pretty closely related. >>>>>> What are those platform devices and what are you going to discuss >>>>>> exactly? >>>>> Devices that don't have a unified interrupt routing scheme like PCI where >>>>> you just link lines A/B/C/D to your controller and you're good to go. >>>> Ah. VIO in my case. >>>> >>>> >>>> >>>>>>>> VMSTATE.*EQUAL. Correct? >>>>>>> Why would you need this? I think we already said a couple dozen times >>>>>>> that >>>>>>> configuration matching is a bigger problem, no? >>>>>> For debug! It is not needed in general, yes. >>>>>> >>>>>> >>>>>>>> If so (more or less), I still would like to keep patches 1..7. >>>>>>>> In fact, the first one is independent and we need it anyway. >>>>>>>> Yes/no? >>>>>>> Why? >>>>>> IOMMUs do not migrate correctly - they only have a class have and >>>>>> instance_id and this instance_it depends on command line arguments >>>>>> order. >>>>>> The #1 patch makes it classname + liobn. >>>>> Why do we need a bus for that? >>>> For BusClass::get_dev_path callback to get an unique name. >>> Juan, I don't think it makes a lot of sense to require a new fake bus just >>> to give us a consistent migration view of things. >>> >>> Do you have any ideas how to migration busless devices? We could just >>> detect that case and give them numbering based on their occurence in the >>> global QOM hierarchy, no? >> >> The mentioned instance_id is that occurrence number which totally depends >> on the device order in the command line. And I have to not to depend on >> that. > > So how would a bus fix that? The bus gets populated based on the command > line order just as well, no?
Bus provides an unique name for every IOMMU, and every single IOMMU always has instance_id == 0 so migration chunk cannot possibly go to a wrong device. -- Alexey