On 16.12.2016 18:03, Dr. David Alan Gilbert wrote: > * Thomas Huth (th...@redhat.com) wrote: >> On 18.11.2016 09:13, Thomas Huth wrote: >>> On 17.11.2016 04:45, David Gibson wrote: >>>> On Mon, Nov 14, 2016 at 07:34:59PM +0100, Juan Quintela wrote: >>>>> Thomas Huth <th...@redhat.com> wrote: >>>>>> qemu_savevm_state_iterate() expects the iterators to return 1 >>>>>> when they are done, and 0 if there is still something left to do. >>>>>> However, ram_save_iterate() does not obey this rule and returns >>>>>> the number of saved pages instead. This causes a fatal hang with >>>>>> ppc64 guests when you run QEMU like this (also works with TCG): >>>>>> >>>>>> qemu-img create -f qcow2 /tmp/test.qcow2 1M >>>>>> qemu-system-ppc64 -nographic -nodefaults -m 256 \ >>>>>> -hda /tmp/test.qcow2 -serial mon:stdio >>>>>> >>>>>> ... then switch to the monitor by pressing CTRL-a c and try to >>>>>> save a snapshot with "savevm test1" for example. >>>>>> >>>>>> After the first iteration, ram_save_iterate() always returns 0 here, >>>>>> so that qemu_savevm_state_iterate() hangs in an endless loop and you >>>>>> can only "kill -9" the QEMU process. >>>>>> Fix it by using proper return values in ram_save_iterate(). >>>>>> >>>>>> Signed-off-by: Thomas Huth <th...@redhat.com> >>>>> >>>>> Reviewed-by: Juan Quintela <quint...@redhat.com> >>>>> >>>>> Applied. >>>>> >>>>> I don't know how we broked this so much. >>>> >>>> Note that block save iterate has the same bug... >>> >>> I think you're right. Care to send a patch? >> >> Looking at this issue again ... could it be that block_save_iterate() is >> currently just dead code? >> As far as I can see, the ->save_live_iterate() handlers are only called >> from qemu_savevm_state_iterate(), right? And qemu_savevm_state_iterate() >> only calls the handlers if se->ops->is_active(se->opaque) returns true. >> But block_is_active() seems to only return 0 during savevm, most likely >> because qemu_savevm_state() explicitly sets the "blk" and "shared" >> MigrationParams to zero. >> So to me, it looks like we could also just remove block_save_iterate() >> completely ... or did I miss something here? > > Doesn't it get called by migrate -b ?
Ah, right, yes, I somehow missed that ... I probably shouldn't do such experiments at the end of Friday afternoon ;-) OK, so it seems that - block_save_iterate() is not called during savevm at all (and thus the bad return code does not matter here) - migrate -b runs block_save_iterate() but the return code is ignored in migration_thread() So we do not have a real problem here, but I think we should still clean up the return code of block_save_iterate() to be on the safe side for the future... Thomas