Am 15.12.2014 um 16:52 hat Peter Lieven geschrieben: > On 15.12.2014 16:43, Peter Lieven wrote: > >On 15.12.2014 16:01, Kevin Wolf wrote: > >>Am 09.12.2014 um 17:26 hat Peter Lieven geschrieben: > >>>this patch finally introduces multiread support to virtio-blk. While > >>>multiwrite support was there for a long time, read support was missing. > >>> > >>>To achieve this the patch does several things which might need further > >>>explanation: > >>> > >>> - the whole merge and multireq logic is moved from block.c into > >>> virtio-blk. This is move is a preparation for directly creating a > >>> coroutine out of virtio-blk. > >>> > >>> - requests are only merged if they are strictly sequential, and no > >>> longer sorted. This simplification decreases overhead and reduces > >>> latency. It will also merge some requests which were unmergable before. > >>> > >>> The old algorithm took up to 32 requests, sorted them and tried to > >>> merge > >>> them. The outcome was anything between 1 and 32 requests. In case of > >>> 32 requests there were 31 requests unnecessarily delayed. > >>> > >>> On the other hand let's imagine e.g. 16 unmergeable requests followed > >>> by 32 mergable requests. The latter 32 requests would have been split > >>> into two 16 byte requests. > >>> > >>> Last the simplified logic allows for a fast path if we have only a > >>> single request in the multirequest. In this case the request is sent as > >>> ordinary request without multireq callbacks. > >>> > >>>As a first benchmark I installed Ubuntu 14.04.1 on a local SSD. The number > >>>of > >>>merged requests is in the same order while the write latency is obviously > >>>decreased by several percent. > >>> > >>>cmdline: > >>>qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom > >>>ubuntu-14.04.1-server-amd64.iso \ > >>> -drive if=virtio,file=/dev/ssd/ubuntu1404,aio=native,cache=none -monitor > >>> stdio > >>> > >>>Before: > >>>virtio0: > >>> rd_bytes=151056896 wr_bytes=2683947008 rd_operations=18614 > >>> wr_operations=67979 > >>> flush_operations=15335 wr_total_time_ns=540428034217 > >>> rd_total_time_ns=11110520068 > >>> flush_total_time_ns=40673685006 rd_merged=0 wr_merged=15531 > >>> > >>>After: > >>>virtio0: > >>> rd_bytes=149487104 wr_bytes=2701344768 rd_operations=18148 > >>> wr_operations=68578 > >>> flush_operations=15368 wr_total_time_ns=437030089565 > >>> rd_total_time_ns=9836288815 > >>> flush_total_time_ns=40597981121 rd_merged=690 wr_merged=14615 > >>> > >>>Some first numbers of improved read performance while booting: > >>> > >>>The Ubuntu 14.04.1 vServer from above: > >>>virtio0: > >>> rd_bytes=97545216 wr_bytes=119808 rd_operations=5071 wr_operations=26 > >>> flush_operations=2 wr_total_time_ns=8847669 rd_total_time_ns=13952575478 > >>> flush_total_time_ns=3075496 rd_merged=742 wr_merged=0 > >>> > >>>Windows 2012R2 (booted from iSCSI): > >>>virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 > >>>wr_operations=360 > >>> flush_operations=68 wr_total_time_ns=34344992718 > >>> rd_total_time_ns=134386844669 > >>> flush_total_time_ns=18115517 rd_merged=641 wr_merged=216 > >>> > >>>Signed-off-by: Peter Lieven <p...@kamp.de> > >>Looks pretty good. The only thing I'm still unsure about are possible > >>integer overflows in the merging logic. Maybe you can have another look > >>there (ideally not only the places I commented on below, but the whole > >>function). > >> > >>>@@ -414,14 +402,81 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, > >>>MultiReqBuffer *mrb) > >>> iov_from_buf(in_iov, in_num, 0, serial, size); > >>> virtio_blk_req_complete(req, VIRTIO_BLK_S_OK); > >>> virtio_blk_free_request(req); > >>>- } else if (type & VIRTIO_BLK_T_OUT) { > >>>- qemu_iovec_init_external(&req->qiov, iov, out_num); > >>>- virtio_blk_handle_write(req, mrb); > >>>- } else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) { > >>>- /* VIRTIO_BLK_T_IN is 0, so we can't just & it. */ > >>>- qemu_iovec_init_external(&req->qiov, in_iov, in_num); > >>>- virtio_blk_handle_read(req); > >>>- } else { > >>>+ break; > >>>+ } > >>>+ case VIRTIO_BLK_T_IN: > >>>+ case VIRTIO_BLK_T_OUT: > >>>+ { > >>>+ bool is_write = type & VIRTIO_BLK_T_OUT; > >>>+ int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev), > >>>+ &req->out.sector); > >>>+ int max_transfer_length = > >>>blk_get_max_transfer_length(req->dev->blk); > >>>+ int nb_sectors = 0; > >>>+ bool merge = true; > >>>+ > >>>+ if (!virtio_blk_sect_range_ok(req->dev, sector_num, > >>>req->qiov.size)) { > >>>+ virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR); > >>>+ virtio_blk_free_request(req); > >>>+ return; > >>>+ } > >>>+ > >>>+ if (is_write) { > >>>+ qemu_iovec_init_external(&req->qiov, iov, out_num); > >>>+ trace_virtio_blk_handle_write(req, sector_num, > >>>+ req->qiov.size / > >>>BDRV_SECTOR_SIZE); > >>>+ } else { > >>>+ qemu_iovec_init_external(&req->qiov, in_iov, in_num); > >>>+ trace_virtio_blk_handle_read(req, sector_num, > >>>+ req->qiov.size / > >>>BDRV_SECTOR_SIZE); > >>>+ } > >>>+ > >>>+ nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE; > >>qiov.size is controlled by the guest, and nb_sectors is only an int. Are > >>you sure that this can't overflow? > > > >In theory, yes. For this to happen in_iov or iov needs to contain > >2TB of data on 32-bit systems. But theoretically there could > >also be already an overflow in qemu_iovec_init_external where > >multiple size_t are summed up in a size_t. > > > >There has been no overflow checking in the merge routine in > >the past, but if you feel better, we could add sth like this: > > > >diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c > >index cc0076a..e9236da 100644 > >--- a/hw/block/virtio-blk.c > >+++ b/hw/block/virtio-blk.c > >@@ -410,8 +410,8 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, > >MultiReqBuffer *mrb) > > bool is_write = type & VIRTIO_BLK_T_OUT; > > int64_t sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev), > >&req->out.sector); > >- int max_transfer_length = > >blk_get_max_transfer_length(req->dev->blk); > >- int nb_sectors = 0; > >+ int64_t max_transfer_length = > >blk_get_max_transfer_length(req->dev->blk); > >+ int64_t nb_sectors = 0; > > bool merge = true; > > > > if (!virtio_blk_sect_range_ok(req->dev, sector_num, > > req->qiov.size)) { > >@@ -431,6 +431,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, > >MultiReqBuffer *mrb) > > } > > > > nb_sectors = req->qiov.size / BDRV_SECTOR_SIZE; > >+ max_transfer_length = MIN_NON_ZERO(max_transfer_length, INT_MAX); > > > > block_acct_start(blk_get_stats(req->dev->blk), > > &req->acct, req->qiov.size, > >@@ -443,8 +444,7 @@ void virtio_blk_handle_request(VirtIOBlockReq *req, > >MultiReqBuffer *mrb) > > } > > > > /* merge would exceed maximum transfer length of backend device */ > >- if (max_transfer_length && > >- mrb->nb_sectors + nb_sectors > max_transfer_length) { > >+ if (nb_sectors + mrb->nb_sectors > max_transfer_length) { > > merge = false; > > } > > > > May also this here: > > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c > index cc0076a..fa647b6 100644 > --- a/hw/block/virtio-blk.c > +++ b/hw/block/virtio-blk.c > @@ -333,6 +333,9 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev, > uint64_t nb_sectors = size >> BDRV_SECTOR_BITS; > uint64_t total_sectors; > > + if (nb_sectors > INT_MAX) { > + return false; > + } > if (sector & dev->sector_mask) { > return false; > } > > > Thats something that has not been checked for ages as well.
Adding checks can never hurt, so go for it. ;-) Kevin