nbd-client: fix nbd_read_reply_entry

Eric Blake Mon, 07 Aug 2017 08:30:46 -0700

On 08/07/2017 07:56 AM, Vladimir Sementsov-Ogievskiy wrote:
> 07.08.2017 14:52, Eric Blake wrote:
>> On 08/04/2017 10:14 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Set reply.handle to 0 on error path to prevent normal path of
>>> nbd_co_receive_reply.


Side note: in general, our server must allow a handle of 0 as valid (as
the handle is something chosen by the client); but our particular client
always uses non-zero handles (and therefore using 0 as a sentinel for an
error path is okay).

>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
>>> ---
>>>   block/nbd-client.c | 1 +
>>>   1 file changed, 1 insertion(+)
>> Can you document a case where not fixing this would be an observable bug
>> (even if it requires using gdb and single-stepping between client and
>> server to make what is otherwise a racy situation easy to see)?  I'm
>> trying to figure out if this is 2.10 material.
> 
> it is simple enough:
> 
> run qemu-nbd in gdb, set break on nbd_send_reply, and when it shoot s,
> next up to "stl_be_p(buf, NBD_REPLY_MAGIC);" and after it do "call
> stl_be_p(buf, 1000)"

Okay, so you have set up a malicious server intentionally sending bad
magic...

> 
> run qemu-io with some read in gdb, set break on
> br block/nbd-client.c:83
> 
> ( it is break; after failed nbd_receive_reply call)
> 
> and on
> br block/nbd-client.c:170
> 
> (it is in nbd_co_receive_reply after yield)
> 
> on first break we will be sure that  nbd_receive_reply failed,
> on second we will be sure by
> (gdb) p s->reply
> $1 = {handle = 93825000680144, error = 0}
> (gdb) p request->handle
> $2 = 93825000680144
> 
> that we are on normal receiving path.

...and the client is recognizing that the server sent garbage, but then
proceeds to handle the packet anyway.  The ideal reaction is that the
client should disconnect from the server, rather than handle the packet.
 But because it didn't do that, the client is now unable to receive any
future messages from the server.  Compare the traces of:

$  ./qemu-io -c 'r 0 1' -f raw nbd://localhost:10809/foo --trace nbd_\*

against a working server:

29103@1502118291.281180:nbd_opt_go_success Export is good to go
29103@1502118291.281397:nbd_send_request Sending request to server: {
.from = 0, .len = 1, .handle = 93860726319200, .flags = 0x0, .type = 0
(read) }
29103@1502118291.281705:nbd_receive_reply Got reply: { magic =
0x67446698, .error =  0, handle = 93860726319200 }
read 1/1 bytes at offset 0
1 bytes, 1 ops; 0.0003 sec (2.822 KiB/sec and 2890.1734 ops/sec)
29103@1502118291.281773:nbd_send_request Sending request to server: {
.from = 0, .len = 0, .handle = 93860726319200, .flags = 0x0, .type = 3
(flush) }
29103@1502118291.281902:nbd_receive_reply Got reply: { magic =
0x67446698, .error =  0, handle = 93860726319200 }
29103@1502118291.281939:nbd_send_request Sending request to server: {
.from = 0, .len = 0, .handle = 93860726319200, .flags = 0x0, .type = 3
(flush) }
29103@1502118291.282064:nbd_receive_reply Got reply: { magic =
0x67446698, .error =  0, handle = 93860726319200 }
29103@1502118291.282078:nbd_send_request Sending request to server: {
.from = 0, .len = 0, .handle = 0, .flags = 0x0, .type = 2 (discard) }

followed by a clean disconnect; vs. the buggy server:

29148@1502118384.385133:nbd_opt_go_success Export is good to go
29148@1502118384.385321:nbd_send_request Sending request to server: {
.from = 0, .len = 1, .handle = 94152262559840, .flags = 0x0, .type = 0
(read) }
29148@1502118396.494643:nbd_receive_reply Got reply: { magic =
0x1446698, .error =  0, handle = 94152262559840 }
invalid magic (got 0x1446698)
read 1/1 bytes at offset 0
1 bytes, 1 ops; 0:00:12.10 (0.082581 bytes/sec and 0.0826 ops/sec)
29148@1502118396.494746:nbd_send_request Sending request to server: {
.from = 0, .len = 0, .handle = 94152262559840, .flags = 0x0, .type = 3
(flush) }

where the client is now hung.  Thankfully, the client is blocked in an
idle loop (not eating CPU), so I don't know if this counts as the
ability for a malicious server to cause a denial of service against qemu
as an NBD client (in general, being unable to read further data from the
server because client internal state is now botched is not that much
different from being unable to read further data from the server because
the client hung up on the invalid server), unless there is some way to
cause qemu to die from an assertion failure rather than just get stuck.

It also looks like the client acts on at most one bad packet from the
server before it gets stuck - but the question is whether a malicious
server could, in that one bad packet reply, cause qemu to misbehave in
some other way.

I'm adding Prasad, to analyze whether this needs a CVE.

We don't have a good protocol fuzzer to simulate a bad client and/or bad
server as the partner over the socket - if you can find any more such
interactions where a bad partner can cause a hang or crash, let's get
those fixed in 2.10.

Meanwhile, I'll probably include this patch in 2.10 (after first
figuring out if it works in isolation or needs other patches from your
series).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 06/17] block/nbd-client: fix nbd_read_reply_entry

Reply via email to