On 2011-08-05 08:51, Michael Tokarev wrote: > 05.08.2011 02:19, Jan Kiszka wrote: > [] >> Resume on error in migrate_fd_put_buffer raced with migrate_fd_cleanup >> triggered via migrate_fd_put_ready called from migrate_fd_connect. >> >> This migration code is a horrible maze. Patch below tries to move >> monitor_resume a bit out of this. Please check if it works for you, then >> I'll send it out properly. > > Yes it's a maze. > > The patch was mime-damaged, I had to apply it manually, but it > wasn't difficult (since it's understandable what it does etc).
Sorry, likely forgot to disable line-wrapping before hitting send. > > And now I can't trigger the original problem anymore, with any > of my variants. > > Here we go: > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:cat > /tmp/foo" > (qemu) migrate "exec:cat > /tmp/foo" > (qemu) migrate "exec:foo" > sh: foo: not found > (qemu) migrate "exec:foo" > sh: foo: not found > > (qemu) q > > As you can see, I can hit either of the two cases - with > and without extra newline, and in both cases (and in > successful case too) it works correctly. > > There's a difference still between the two - namely that > extra newline - but that's something else and merely > cosmetic. > > I also verified that -incoming migration works as expected, > in both failure and success cases - and indeed it works. > > Speaking of the patch: shouldn't migrate_fd_close() be > called from migrate_fd_cleanup(), Maybe. But it requires careful review what the migration close callbacks are doing and if that is always equivalent (or harmless). > or vise versa, or both > be combined into one? In migrate_fd_cleanup(), the new > logic is not clear still. New version of it: > > int migrate_fd_cleanup(FdMigrationState *s) > { > int ret = 0; > > qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL); > > if (s->file) { > DPRINTF("closing file\n"); > if (qemu_fclose(s->file) != 0) { > ret = -1; > } > s->file = NULL; > } else { > if (s->mon) { > monitor_resume(s->mon); > } > } > > if (s->fd != -1) { > close(s->fd); > s->fd = -1; > } > > return ret; > } > > Why it's EITHER s->file OR s->mon, but not both? Shouldn't > the s->mon thing be unconditional? See description of the patch I just sent out. > > And the whole thing is waiting coroutine conversion badly, > since the logic is indeed a maze ;) Yeah, though I think this will not magically remove all windings here. Jan
signature.asc
Description: OpenPGP digital signature