"Daniel P. Berrange" <berra...@redhat.com> wrote: > On Tue, Sep 20, 2011 at 03:24:41PM +0200, Juan Quintela wrote: >> If we have one error while migrating, and then we issuse a >> "migrate_cancel" command, guest hang. Fix it for flushing only when >> migration is in MIG_STATE_ACTIVE. In case of error of cancellation, >> don't flush. >> >> We had an infinite loop at buffered_close() >> >> while (!s->has_error && s->buffer_size) { >> buffered_flush(s); >> if (s->freeze_output) >> s->wait_for_unfreeze(s); >> } >> >> There was no errors, there were things to send, and connection was >> broken. send() returns -EAGAIN, so we freezed output, but we >> unfreeze_output and try again. > > I posted a couple of alternative approaches to fixing this > hang problem > > http://lists.nongnu.org/archive/html/qemu-devel/2011-08/msg03248.html > > My second approach of checking the migration state in migrate_fd_put_buffer() > seems like it would be worthwhile, even with your patch as an additional > safety net against bad code.
We can add that there, but in my tests, the s->write() was returning correctly an error (or -EAGAIN). The problem was that we were not exiting when we didn't needed to. I agree that we can have *both* tests. I will add your patch to my series. Thanks for the fast review. Later, Juan.