On 08/11/2017 09:53 AM, Vladimir Sementsov-Ogievskiy wrote: > 11.08.2017 17:15, Eric Blake wrote: >> On 08/11/2017 02:48 AM, Vladimir Sementsov-Ogievskiy wrote: >>> 11.08.2017 05:37, Eric Blake wrote: >>>> As soon as the server is sending us garbage, we should quit >>>> trying to send further messages to the server, and allow all >>>> pending coroutines for any remaining replies to error out. >>>> Failure to do so can let a malicious server cause the client >>>> to hang, for example, if the server sends an invalid magic >>>> number in its response.
>> The nbd_recv_coroutines_enter_all() call schedules all pending >> nbd_co_send_request coroutines to fire as soon as the current coroutine >> reaches a yield point. The next yield point is during the >> BDRV_POLL_WHILE of nbd_teardown_connection - but this is AFTER we've >> called qio_channel_shutdown() - so as long as nbd_rwv() is called with a >> valid ioc, the qio code should recognize that we are shutting down the >> connection and gracefully give an error on each write attempt. > > > Hmm, was it correct even before your patch? Is it safe to enter a coroutine > (which we've scheduled by nbd_recv_coroutines_enter_all()), which is > actually > yielded inside nbd_rwv (not our yield in nbd_co_receive_reply)? I'm honestly not sure how to answer the question. In my testing, I was unable to catch a coroutine yielding inside of nbd_rwv(); I was able to easily provoke a situation where the client can send two or more commands prior to the server getting a chance to reply to either: ./qemu-io -f raw nbd://localhost:10809/foo \ -c 'aio_read 0 512' -c 'aio_write 1k 1m' where tracing the server proves that the server received both commands before sending a reply; when the client sends two aio_read commands, it was even the case that I could observe the server replying to the second read before the first. So I'm definitely provoking parallel coroutines. But even without my tentative squash patch, I haven't been able to observe s->ioc change from valid to NULL within the body of nbd_co_send_request - either the entire request is skipped because ioc was already cleared, or the entire request operates on a valid ioc (although the request may still fail with EPIPE because the ioc has started its efforts at shutdown). I even tried varying the size of the aio_write; with 1M, the client got the write request sent off before the server's reply; but 2M was large enough that the server sent the read reply before the client could send the write. Since we are using a mutex, we have at most one coroutine able to attempt a write at once; but that still says nothing about how many other parallel coroutines can wake up to do a read. I also think the fact that we are using qio's set_cork around the paired writes is helping: although we have two calls to nbd_rwv(), the first one is for the NBD_CMD_WRITE header which is small that the qio layer waits for the second nbd_rwv() call before actually sending anything over the wire (if anything, we are more likely to see s->ioc change before the final set_cork call, rather than between the two writes - if that change can even happen). >> >> I see your point about the fact that coroutines can change hands in >> between our two writes for an NBD_CMD_WRITE in nbd_co_send_request() >> (the first write is nbd_send_request() for the header, the second is >> nbd_rwv() for the data) - if between those two writes we process a >> failing read, I see your point about us risking re-reading s->ioc as > > But there are no yields between two writes, so, if previous logic is > correct, > if the read fails during first write it will return and error and we > will not go > into the second write. If it fails during the second write, it should be > OK too. If we can ever observe s->ioc changing to NULL, then my followup squash patch is needed (if nothing else, calling qio_channel_set_cork(NULL, false) will crash). But I'm not familiar enough with coroutines to know if it is possible, or just paranoia on my part. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature