04.08.2011 23:24, Luiz Capitulino пишет: > On Wed, 3 Aug 2011 18:51:44 +0400 > Michael Tokarev <m...@tls.msk.ru> wrote: > >> If we do, it results in double monitor_resume() (second being called >> from migrate_fd_cleanup() anyway) and monitor suspend count becoming >> negative. > > Are you hitting an specific issue or did you find this by code inspection?
The specific case I'm talking about can trivially be triggered. Run migrate "exec:nosuchfile" and your monitor will be stuck forever. > IIRC, I asked Marcelo to add the monitor_resume() call in the fix for > e447b1a603 because the monitor wasn't being resumed in some cases. Don't > remember which though, do you Marcelo? And this (e447b1a603) is actually exactly the same thing: when the child terminates before qemu completes sending stuff to it. > I see two possibilities here: > > 1. After e447b1a603 there was some change that made the monitor_resume() call > in migrate_fd_put_buffer() unnecessary > > 2. We're calling it in the wrong place > > Taking a quick look at the code I see that migrate_fd_cleanup() doesn't > seem to be called when qemu_savevm_state_iterate() fails, for example. Yes indeed, but that will lead to a havoc in other places, like leaving callback notifier registered even after migration gets cancelled, or keeping the file open. Note that qemu_savevm_state_iterate() gets called from a callback routine. What I observe here is that when the exec: process exits prematurely, qemu will continue writing stuff to pipe up till migrate_fd_put_buffer() hits error (EPIPE iirc), at which point we'll call mon->resume() twice. The same happens in case of tcp migration when the other end closes the connection. I don't know when qemu_savevm_state_iterate() may fail, or why qemu_file_has_error() in there does not return true. Maybe the problem here is much more severe actually, -- that's why I initially asked what it is all about, -- before offering the patch. Thanks, /mjt >> Signed-Off-By: Michael Tokarev <m...@tls.msk.ru> >> Reviewed-By: Jan Kiszka <jan.kis...@siemens.com> >> --- >> migration.c | 3 --- >> 1 files changed, 0 insertions(+), 3 deletions(-) >> >> diff --git a/migration.c b/migration.c >> index 2a15b98..7ca883f 100644 >> --- a/migration.c >> +++ b/migration.c >> @@ -330,9 +330,6 @@ ssize_t migrate_fd_put_buffer(void *opaque, const void >> *data, size_t size) >> if (ret == -EAGAIN) { >> qemu_set_fd_handler2(s->fd, NULL, NULL, migrate_fd_put_notify, s); >> } else if (ret < 0) { >> - if (s->mon) { >> - monitor_resume(s->mon); >> - } >> s->state = MIG_STATE_ERROR; >> notifier_list_notify(&migration_state_notifiers, NULL); >> } >