On 08/15/2017 09:45 AM, Vladimir Sementsov-Ogievskiy wrote: > 15.08.2017 00:34, Eric Blake wrote: >> When we switched NBD to use coroutines for qemu 2.9 (in particular, >> commit a12a712a), we introduced a regression: if a server sends us >> garbage (such as a corrupted magic number), we quit the read loop >> but do not stop sending further queued commands, resulting in the >> client hanging when it never reads the response to those additional >> commands. In qemu 2.8, we properly detected that the server is no >> longer reliable, and cancelled all existing pending commands with >> EIO, then tore down the socket so that all further command attempts >> get EPIPE. >> >> Restore the proper behavior of quitting (almost) all communication >> with a broken server: Once we know we are out of sync or otherwise >> can't trust the server, we must assume that any further incoming >> data is unreliable and therefore end all pending commands with EIO, >> and quit trying to send any further commands. As an exception, we >> still (try to) send NBD_CMD_DISC to let the server know we are going >> away (in part, because it is easier to do that than to further >> refactor nbd_teardown_connection, and in part because it is the >> only command where we do not have to wait for a reply). >> >> Based on a patch by Vladimir Sementsov-Ogievskiy. >>
>> +++ b/block/nbd-client.c >> @@ -73,7 +73,7 @@ static coroutine_fn void nbd_read_reply_entry(void >> *opaque) >> int ret; >> Error *local_err = NULL; >> >> - for (;;) { >> + while (!s->quit) { > > looks like quit will never be true here > >> assert(s->reply.handle == 0); >> ret = nbd_receive_reply(s->ioc, &s->reply, &local_err); >> if (ret < 0) { >> @@ -107,6 +107,9 @@ static coroutine_fn void nbd_read_reply_entry(void >> *opaque) >> qemu_coroutine_yield(); >> } >> >> + if (ret < 0) { >> + s->quit = true; > > so, you set quit only here.. if we fail on some write, reading coroutine > will not > know about it and will continue reading.. Looks like you are correct - your version set the flag in more places than me, but it looks like you're right that we DO want to set the flag when writing hits a known failure. Here's what I plan to squash in: diff --git i/block/nbd-client.c w/block/nbd-client.c index bb17e3da45..422ecb4307 100644 --- i/block/nbd-client.c +++ w/block/nbd-client.c @@ -161,6 +161,9 @@ static int nbd_co_send_request(BlockDriverState *bs, } else { rc = nbd_send_request(s->ioc, request); } + if (rc < 0) { + s->quit = true; + } qemu_co_mutex_unlock(&s->send_mutex); return rc; } -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature