* Thomas Huth (th...@redhat.com) wrote: > On 09.11.2016 08:18, Amit Shah wrote: > > On (Fri) 04 Nov 2016 [14:10:17], Thomas Huth wrote: > >> qemu_savevm_state_iterate() expects the iterators to return 1 > >> when they are done, and 0 if there is still something left to do. > >> However, ram_save_iterate() does not obey this rule and returns > >> the number of saved pages instead. This causes a fatal hang with > >> ppc64 guests when you run QEMU like this (also works with TCG): > > > > "works with" -- does that mean reproduces with? > > Yes, that's what I've meant: You can reproduce it with TCG (e.g. running > on a x86 system), too, there's no need for a real POWER machine with KVM > here.
How did you trigger it on x86? Dave > >> qemu-img create -f qcow2 /tmp/test.qcow2 1M > >> qemu-system-ppc64 -nographic -nodefaults -m 256 \ > >> -hda /tmp/test.qcow2 -serial mon:stdio > >> > >> ... then switch to the monitor by pressing CTRL-a c and try to > >> save a snapshot with "savevm test1" for example. > >> > >> After the first iteration, ram_save_iterate() always returns 0 here, > >> so that qemu_savevm_state_iterate() hangs in an endless loop and you > >> can only "kill -9" the QEMU process. > >> Fix it by using proper return values in ram_save_iterate(). > >> > >> Signed-off-by: Thomas Huth <th...@redhat.com> > >> --- > >> migration/ram.c | 6 +++--- > >> 1 file changed, 3 insertions(+), 3 deletions(-) > >> > >> diff --git a/migration/ram.c b/migration/ram.c > >> index fb9252d..a1c8089 100644 > >> --- a/migration/ram.c > >> +++ b/migration/ram.c > >> @@ -1987,7 +1987,7 @@ static int ram_save_iterate(QEMUFile *f, void > >> *opaque) > >> int ret; > >> int i; > >> int64_t t0; > >> - int pages_sent = 0; > >> + int done = 0; > >> > >> rcu_read_lock(); > >> if (ram_list.version != last_version) { > >> @@ -2007,9 +2007,9 @@ static int ram_save_iterate(QEMUFile *f, void > >> *opaque) > >> pages = ram_find_and_save_block(f, false, &bytes_transferred); > >> /* no more pages to sent */ > >> if (pages == 0) { > >> + done = 1; > >> break; > >> } > >> - pages_sent += pages; > >> acct_info.iterations++; > >> > >> /* we want to check in the 1st loop, just in case it was the 1st > >> time > >> @@ -2044,7 +2044,7 @@ static int ram_save_iterate(QEMUFile *f, void > >> *opaque) > >> return ret; > >> } > >> > >> - return pages_sent; > >> + return done; > >> } > > > > I agree with David, we can just remove the return value. The first > > patch of the series can do that; and this one could become the 2nd > > patch. Should be OK for the soft freeze. > > Sorry, I still did not quite get it - if I'd change the return type of > ram_save_iterate() and the other iterate functions to "void", how is > qemu_savevm_state_iterate() supposed to know whether all iterators are > done or not? And other iterators also use negative return values to > signal errors - should that then be handled via an "Error **" parameter > instead? ... my gut feeling still says that such a bigger rework (we've > got to touch all iterators for this!) should rather not be done right in > the middle of the freeze period... > > Thomas > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK