On 08/07/2017 07:56 AM, Vladimir Sementsov-Ogievskiy wrote: > 07.08.2017 14:52, Eric Blake wrote: >> On 08/04/2017 10:14 AM, Vladimir Sementsov-Ogievskiy wrote: >>> Set reply.handle to 0 on error path to prevent normal path of >>> nbd_co_receive_reply.
Side note: in general, our server must allow a handle of 0 as valid (as the handle is something chosen by the client); but our particular client always uses non-zero handles (and therefore using 0 as a sentinel for an error path is okay). >>> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> >>> --- >>> block/nbd-client.c | 1 + >>> 1 file changed, 1 insertion(+) >> Can you document a case where not fixing this would be an observable bug >> (even if it requires using gdb and single-stepping between client and >> server to make what is otherwise a racy situation easy to see)? I'm >> trying to figure out if this is 2.10 material. > > it is simple enough: > > run qemu-nbd in gdb, set break on nbd_send_reply, and when it shoot s, > next up to "stl_be_p(buf, NBD_REPLY_MAGIC);" and after it do "call > stl_be_p(buf, 1000)" Okay, so you have set up a malicious server intentionally sending bad magic... > > run qemu-io with some read in gdb, set break on > br block/nbd-client.c:83 > > ( it is break; after failed nbd_receive_reply call) > > and on > br block/nbd-client.c:170 > > (it is in nbd_co_receive_reply after yield) > > on first break we will be sure that nbd_receive_reply failed, > on second we will be sure by > (gdb) p s->reply > $1 = {handle = 93825000680144, error = 0} > (gdb) p request->handle > $2 = 93825000680144 > > that we are on normal receiving path. ...and the client is recognizing that the server sent garbage, but then proceeds to handle the packet anyway. The ideal reaction is that the client should disconnect from the server, rather than handle the packet. But because it didn't do that, the client is now unable to receive any future messages from the server. Compare the traces of: $ ./qemu-io -c 'r 0 1' -f raw nbd://localhost:10809/foo --trace nbd_\* against a working server: 29103@1502118291.281180:nbd_opt_go_success Export is good to go 29103@1502118291.281397:nbd_send_request Sending request to server: { .from = 0, .len = 1, .handle = 93860726319200, .flags = 0x0, .type = 0 (read) } 29103@1502118291.281705:nbd_receive_reply Got reply: { magic = 0x67446698, .error = 0, handle = 93860726319200 } read 1/1 bytes at offset 0 1 bytes, 1 ops; 0.0003 sec (2.822 KiB/sec and 2890.1734 ops/sec) 29103@1502118291.281773:nbd_send_request Sending request to server: { .from = 0, .len = 0, .handle = 93860726319200, .flags = 0x0, .type = 3 (flush) } 29103@1502118291.281902:nbd_receive_reply Got reply: { magic = 0x67446698, .error = 0, handle = 93860726319200 } 29103@1502118291.281939:nbd_send_request Sending request to server: { .from = 0, .len = 0, .handle = 93860726319200, .flags = 0x0, .type = 3 (flush) } 29103@1502118291.282064:nbd_receive_reply Got reply: { magic = 0x67446698, .error = 0, handle = 93860726319200 } 29103@1502118291.282078:nbd_send_request Sending request to server: { .from = 0, .len = 0, .handle = 0, .flags = 0x0, .type = 2 (discard) } followed by a clean disconnect; vs. the buggy server: 29148@1502118384.385133:nbd_opt_go_success Export is good to go 29148@1502118384.385321:nbd_send_request Sending request to server: { .from = 0, .len = 1, .handle = 94152262559840, .flags = 0x0, .type = 0 (read) } 29148@1502118396.494643:nbd_receive_reply Got reply: { magic = 0x1446698, .error = 0, handle = 94152262559840 } invalid magic (got 0x1446698) read 1/1 bytes at offset 0 1 bytes, 1 ops; 0:00:12.10 (0.082581 bytes/sec and 0.0826 ops/sec) 29148@1502118396.494746:nbd_send_request Sending request to server: { .from = 0, .len = 0, .handle = 94152262559840, .flags = 0x0, .type = 3 (flush) } where the client is now hung. Thankfully, the client is blocked in an idle loop (not eating CPU), so I don't know if this counts as the ability for a malicious server to cause a denial of service against qemu as an NBD client (in general, being unable to read further data from the server because client internal state is now botched is not that much different from being unable to read further data from the server because the client hung up on the invalid server), unless there is some way to cause qemu to die from an assertion failure rather than just get stuck. It also looks like the client acts on at most one bad packet from the server before it gets stuck - but the question is whether a malicious server could, in that one bad packet reply, cause qemu to misbehave in some other way. I'm adding Prasad, to analyze whether this needs a CVE. We don't have a good protocol fuzzer to simulate a bad client and/or bad server as the partner over the socket - if you can find any more such interactions where a bad partner can cause a hang or crash, let's get those fixed in 2.10. Meanwhile, I'll probably include this patch in 2.10 (after first figuring out if it works in isolation or needs other patches from your series). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature