> Am 15.12.2014 um 17:00 schrieb Kevin Wolf <kw...@redhat.com>: > > Am 15.12.2014 um 16:52 hat Peter Lieven geschrieben: >> On 15.12.2014 16:43, Peter Lieven wrote: >>> On 15.12.2014 16:01, Kevin Wolf wrote: >>>> Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben: >>>>> this patch finally introduces multiread support to virtio-blk. While >>>>> multiwrite support was there for a long time, read support was missing. >>>>> >>>>> To achieve this the patch does several things which might need further >>>>> explanation: >>>>> >>>>> - the whole merge and multireq logic is moved from block.c into >>>>> virtio-blk. This is move is a preparation for directly creating a >>>>> coroutine out of virtio-blk. >>>>> >>>>> - requests are only merged if they are strictly sequential, and no >>>>> longer sorted. This simplification decreases overhead and reduces >>>>> latency. It will also merge some requests which were unmergable before. >>>>> >>>>> The old algorithm took up to 32 requests, sorted them and tried to merge >>>>> them. The outcome was anything between 1 and 32 requests. In case of >>>>> 32 requests there were 31 requests unnecessarily delayed. >>>>> >>>>> On the other hand let's imagine e.g. 16 unmergeable requests followed >>>>> by 32 mergable requests. The latter 32 requests would have been split >>>>> into two 16 byte requests. >>>>> >>>>> Last the simplified logic allows for a fast path if we have only a >>>>> single request in the multirequest. In this case the request is sent as >>>>> ordinary request without multireq callbacks. >>>>> >>>>> As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The >>>>> number of >>>>> merged requests is in the same order while the write latency is obviously >>>>> decreased by several percent. >>>>> >>>>> cmdline: >>>>> qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom >>>>> ubuntu-14.04.1-server-amd64.iso \ >>>>> -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor >>>>> stdio >>>>> >>>>> Before: >>>>> virtio0: >>>>> rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 >>>>> wr_operations=67979 >>>>> flush_operations=15335 wr_total_time_ns=540428034217 >>>>> rd_total_time_ns=11110520068 >>>>> flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531 >>>>> >>>>> After: >>>>> virtio0: >>>>> rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 >>>>> wr_operations=68578 >>>>> flush_operations=15368 wr_total_time_ns=437030089565 >>>>> rd_total_time_ns=9836288815 >>>>> flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615 >>>>> >>>>> Some first numbers of improved read performance while booting: >>>>> >>>>> The Ubuntu 14.04.1 vServer from above: >>>>> virtio0: >>>>> rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26 >>>>> flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478 >>>>> flush_total_time_ns=3075496 rd_merged=742 wr_merged=0 >>>>> >>>>> Windows 2012R2 (booted from iSCSI): >>>>> virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 >>>>> wr_operations=360 >>>>> flush_operations=68 wr_total_time_ns=34344992718 >>>>> rd_total_time_ns=134386844669 >>>>> flush_total_time_ns=18115517 rd_merged=641 wr_merged=216 >>>>> >>>>> Signed-off-by: Peter Lieven <p...@kamp.de> >>>> Looks pretty good. The only thing I'm still unsure about are possible >>>> integer overflows in the merging logic. Maybe you can have another look >>>> there (ideally not only the places I commented on below, but the whole >>>> function). >>>> >>>>> @@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, >>>>> MultiReqBuffer *mrb) >>>>> iov_from_buf(in_iov, in_num, 0, serial, size); >>>>> virtio_blk_req_complete(req, VIRTIO_BLK_S_OK); >>>>> virtio_blk_free_request(req); >>>>> - } else if (type & VIRTIO_BLK_T_OUT) { >>>>> - qemu_iovec_init_external(&req->qiov, iov, out_num); >>>>> - virtio_blk_handle_write(req, mrb); >>>>> - } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) { >>>>> - /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */ >>>>> - qemu_iovec_init_external(&req->qiov, in_iov, in_num); >>>>> - virtio_blk_handle_read(req); >>>>> - } else { >>>>> + break; >>>>> + } >>>>> + case VIRTIO_BLK_T_IN: >>>>> + case VIRTIO_BLK_T_OUT: >>>>> + { >>>>> + bool is_write = type & VIRTIO_BLK_T_OUT; >>>>> + int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev), >>>>> + &req->out.sector); >>>>> + int max_transfer_length = >>>>> blk_get_max_transfer_length(req->dev->blk); >>>>> + int nb_sectors = 0; >>>>> + bool merge = true; >>>>> + >>>>> + if (!virtio_blk_sect_range_ok(req->dev, sector_num, >>>>> req->qiov.size)) { >>>>> + virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); >>>>> + virtio_blk_free_request(req); >>>>> + return; >>>>> + } >>>>> + >>>>> + if (is_write) { >>>>> + qemu_iovec_init_external(&req->qiov, iov, out_num); >>>>> + trace_virtio_blk_handle_write(req, sector_num, >>>>> + req->qiov.size / >>>>> BDRV_SECTOR_SIZE); >>>>> + } else { >>>>> + qemu_iovec_init_external(&req->qiov, in_iov, in_num); >>>>> + trace_virtio_blk_handle_read(req, sector_num, >>>>> + req->qiov.size / >>>>> BDRV_SECTOR_SIZE); >>>>> + } >>>>> + >>>>> + nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE; >>>> qiov.size is controlled by the guest, and nb_sectors is only an int. Are >>>> you sure that this can't overflow? >>> >>> In theory, yes. For this to happen in_iov or iov needs to contain >>> 2TB of data on 32-bit systems. But theoretically there could >>> also be already an overflow in qemu_iovec_init_external where >>> multiple size_t are summed up in a size_t. >>> >>> There has been no overflow checking in the merge routine in >>> the past, but if you feel better, we could add sth like this: >>> >>> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c >>> index cc0076a..e9236da 100644 >>> --- a/hw/block/virtio-blk.c >>> +++ b/hw/block/virtio-blk.c >>> @@ -410,8 +410,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, >>> MultiReqBuffer *mrb) >>> bool is_write = type & VIRTIO_BLK_T_OUT; >>> int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev), >>> &req->out.sector); >>> - int max_transfer_length = >>> blk_get_max_transfer_length(req->dev->blk); >>> - int nb_sectors = 0; >>> + int64_t max_transfer_length = >>> blk_get_max_transfer_length(req->dev->blk); >>> + int64_t nb_sectors = 0; >>> bool merge = true; >>> >>> if (!virtio_blk_sect_range_ok(req->dev, sector_num, req->qiov.size)) >>> { >>> @@ -431,6 +431,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, >>> MultiReqBuffer *mrb) >>> } >>> >>> nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE; >>> + max_transfer_length = MIN_NON_ZERO(max_transfer_length, INT_MAX); >>> >>> block_acct_start(blk_get_stats(req->dev->blk), >>> &req->acct, req->qiov.size, >>> @@ -443,8 +444,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, >>> MultiReqBuffer *mrb) >>> } >>> >>> /* merge would exceed maximum transfer length of backend device */ >>> - if (max_transfer_length && >>> - mrb->nb_sectors + nb_sectors > max_transfer_length) { >>> + if (nb_sectors + mrb->nb_sectors > max_transfer_length) { >>> merge = false; >>> } >> >> May also this here: >> >> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c >> index cc0076a..fa647b6 100644 >> --- a/hw/block/virtio-blk.c >> +++ b/hw/block/virtio-blk.c >> @@ -333,6 +333,9 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev, >> uint64_t nb_sectors = size >> BDRV_SECTOR_BITS; >> uint64_t total_sectors; >> >> + if (nb_sectors > INT_MAX) { >> + return false; >> + } >> if (sector & dev->sector_mask) { >> return false; >> } >> >> >> Thats something that has not been checked for ages as well. > > Adding checks can never hurt, so go for it. ;-)
i will add both checks and Fams comment and send a v2 tomorrow. Peter > > Kevin