05.08.2011 02:19, Jan Kiszka wrote: [] > Resume on error in migrate_fd_put_buffer raced with migrate_fd_cleanup > triggered via migrate_fd_put_ready called from migrate_fd_connect. > > This migration code is a horrible maze. Patch below tries to move > monitor_resume a bit out of this. Please check if it works for you, then > I'll send it out properly.
Yes it's a maze. The patch was mime-damaged, I had to apply it manually, but it wasn't difficult (since it's understandable what it does etc). And now I can't trigger the original problem anymore, with any of my variants. Here we go: (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:cat > /tmp/foo" (qemu) migrate "exec:cat > /tmp/foo" (qemu) migrate "exec:foo" sh: foo: not found (qemu) migrate "exec:foo" sh: foo: not found (qemu) q As you can see, I can hit either of the two cases - with and without extra newline, and in both cases (and in successful case too) it works correctly. There's a difference still between the two - namely that extra newline - but that's something else and merely cosmetic. I also verified that -incoming migration works as expected, in both failure and success cases - and indeed it works. Speaking of the patch: shouldn't migrate_fd_close() be called from migrate_fd_cleanup(), or vise versa, or both be combined into one? In migrate_fd_cleanup(), the new logic is not clear still. New version of it: int migrate_fd_cleanup(FdMigrationState *s) { int ret = 0; qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL); if (s->file) { DPRINTF("closing file\n"); if (qemu_fclose(s->file) != 0) { ret = -1; } s->file = NULL; } else { if (s->mon) { monitor_resume(s->mon); } } if (s->fd != -1) { close(s->fd); s->fd = -1; } return ret; } Why it's EITHER s->file OR s->mon, but not both? Shouldn't the s->mon thing be unconditional? And the whole thing is waiting coroutine conversion badly, since the logic is indeed a maze ;) Thank you! /mjt