On 21 March 2018 at 08:00, Shannon Zhao <zhaoshengl...@huawei.com> wrote: > On 2018/3/20 19:54, Peter Maydell wrote: >> Can you still successfully migrate a VM from a QEMU version >> without this bugfix to one with the bugfix ? >> > I've tested this case. I can migrate a VM between these two versions.
Hmm. Looking at the code I can't see how that would work, except by accident. Let me see if I understand what's happening here: In the code in master, we have QEMU data structures (bitmaps, etc) which have one entry for each of GICV3_MAXIRQ irqs. That includes the RAZ/WI unused space for the SPIs/PPIs, so for a 1-bit-per-irq bitmap: [0x00000000, irq 32, irq 33, .... ] When we fill in the values from KVM into these data structures, we start after the unused space, because the for_each_dist_irq_reg() macro starts with _irq = GIC_INTERNAL. But we forgot to adjust the offset value we use for the KVM access, so we start by reading the RAZ/WI values from KVM, and the data structure contents end up with: [0x00000000, 0x00000000, irq 32, irq 33, ... ] (and the last irqs wouldn't get transferred). With this change to the code we will get the offset right and the data structure will be filled as [0x00000000, irq 32, irq 33, .... ] But for migration from the old version, the data structure we receive from the migration source will contain the old broken layout of [0x00000000, 0x00000000, irq 32, irq 33, ... ] so if the new code doesn't do anything special to handle migration from that old version then it will write zeroes to irq 32..63, and then write incorrect values for all the irqs after that, won't it? That suggests to me that we need to have some code in the migration post-load routine that identifies that the data is coming from an old version with this bug, and shifts all the data down in the arrays so that the code to write it to the kernel can handle it. thanks -- PMM