Am 04.12.2014 um 15:42 hat Peter Lieven geschrieben: > On 04.12.2014 15:12, Kevin Wolf wrote: > >Am 02.12.2014 um 15:33 hat Peter Lieven geschrieben: > >>this patch finally introduce multiread support to virtio-blk while > >>multiwrite support was there for a long time read support was missing. > >> > >>To achieve this the patch does serveral things which might need futher > >>explaination: > >> > >> - the whole merge and multireq logic is moved from block.c into > >> virtio-blk. This is move is a preparation for directly creating a > >> coroutine out of virtio-blk. > >> > >> - requests are only merged if they are strictly sequential and no > >> longer sorted. This simplification decreases overhead and reduces > >> latency. It will also merge some requests which were unmergable before. > >> > >> The old algorithm took up to 32 requests sorted them and tried to merge > >> them. The outcome was anything between 1 and 32 requests. In case of > >> 32 requests there were 31 requests unnecessarily delayed. > >> > >> On the other hand lets imagine e.g. 16 unmergeable requests followed > >> by 32 mergable requests. The latter 32 requests would have been split > >> into two 16 byte requests. > >> > >> Last the simplified logic allows for a fast path if we have only a > >> single request in the multirequest. In this case the request is sent as > >> ordinary request without mulltireq callbacks. > >> > >>As a first benchmark I installed Ubuntu 14.04.1 on a ramdisk. The number of > >>merged requests is in the same order while the latency is slightly > >>decreased. > >>One should not stick to much to the numbers because the number of > >>wr_requests > >>are highly fluctuant. I hope the numbers show that this patch is at least > >>not causing to big harm: > >> > >>Cmdline: > >>qemu-system-x86_64 -m 1024 -smp 2 -enable-kvm -cdrom > >>ubuntu-14.04.1-server-amd64.iso \ > >> -drive if=virtio,file=/tmp/ubuntu.raw -monitor stdio > >> > >>Before: > >>virtio0: rd_bytes=150979072 wr_bytes=2591138816 rd_operations=18475 > >>wr_operations=69216 > >> flush_operations=15343 wr_total_time_ns=26969283701 > >> rd_total_time_ns=1018449432 > >> flush_total_time_ns=562596819 rd_merged=0 wr_merged=10453 > >> > >>After: > >>virtio0: rd_bytes=148112896 wr_bytes=2845834240 rd_operations=17535 > >>wr_operations=72197 > >> flush_operations=15760 wr_total_time_ns=26104971623 > >> rd_total_time_ns=1012926283 > >> flush_total_time_ns=564414752 rd_merged=517 wr_merged=9859 > >> > >>Some first numbers of improved read performance while booting: > >> > >>The Ubuntu 14.04.1 vServer from above: > >>virtio0: rd_bytes=86158336 wr_bytes=688128 rd_operations=5851 > >>wr_operations=70 > >> flush_operations=4 wr_total_time_ns=10886705 > >> rd_total_time_ns=416688595 > >> flush_total_time_ns=288776 rd_merged=1297 wr_merged=2 > >> > >>Windows 2012R2: > >>virtio0: rd_bytes=176559104 wr_bytes=61859840 rd_operations=7200 > >>wr_operations=360 > >> flush_operations=68 wr_total_time_ns=34344992718 > >> rd_total_time_ns=134386844669 > >> flush_total_time_ns=18115517 rd_merged=641 wr_merged=216 > >> > >>Signed-off-by: Peter Lieven <p...@kamp.de> > >>--- > >> hw/block/dataplane/virtio-blk.c | 10 +- > >> hw/block/virtio-blk.c | 222 > >> ++++++++++++++++++++++++--------------- > >> include/hw/virtio/virtio-blk.h | 23 ++-- > >> 3 files changed, 156 insertions(+), 99 deletions(-) > >> > >>diff --git a/hw/block/dataplane/virtio-blk.c > >>b/hw/block/dataplane/virtio-blk.c > >>index 1222a37..aa4ad91 100644 > >>--- a/hw/block/dataplane/virtio-blk.c > >>+++ b/hw/block/dataplane/virtio-blk.c > >>@@ -96,9 +96,8 @@ static void handle_notify(EventNotifier *e) > >> event_notifier_test_and_clear(&s->host_notifier); > >> blk_io_plug(s->conf->conf.blk); > >> for (;;) { > >>- MultiReqBuffer mrb = { > >>- .num_writes = 0, > >>- }; > >>+ MultiReqBuffer mrb_rd = {}; > >>+ MultiReqBuffer mrb_wr = {.is_write = 1}; > >> int ret; > >> /* Disable guest->host notifies to avoid unnecessary vmexits */ > >>@@ -117,10 +116,11 @@ static void handle_notify(EventNotifier *e) > >> req->elem.in_num, > >> req->elem.index); > >>- virtio_blk_handle_request(req, &mrb); > >>+ virtio_blk_handle_request(req, &mrb_wr, &mrb_rd); > >> } > >>- virtio_submit_multiwrite(s->conf->conf.blk, &mrb); > >>+ virtio_submit_multireq(s->conf->conf.blk, &mrb_wr); > >>+ virtio_submit_multireq(s->conf->conf.blk, &mrb_rd); > >> if (likely(ret == -EAGAIN)) { /* vring emptied */ > >> /* Re-enable guest->host notifies and stop processing the > >> vring. > >>diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c > >>index 490f961..f522bfc 100644 > >>--- a/hw/block/virtio-blk.c > >>+++ b/hw/block/virtio-blk.c > >>@@ -22,12 +22,15 @@ > >> #include "dataplane/virtio-blk.h" > >> #include "migration/migration.h" > >> #include "block/scsi.h" > >>+#include "block/block_int.h" > >No. :-) > > > >We could either expose the information that you need through > >BlockBackend, or wait until your automatic request splitting logic is > >implemented in block.c and the information isn't needed here any more. > > I will add sth to block.c and follow-up with the request splitting logic > later. It will still make sense to now the limit here. Otherwise we > merge something that we split again afterwards.
Yup, but then make it an official block layer interface instead of including block_int.h in the device emulation. > > > >> #ifdef __linux__ > >> # include <scsi/sg.h> > >> #endif > >> #include "hw/virtio/virtio-bus.h" > >> #include "hw/virtio/virtio-access.h" > >>+/* #define DEBUG_MULTIREQ */ > >>+ > >> VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s) > >> { > >> VirtIOBlockReq *req = g_slice_new(VirtIOBlockReq); > >>@@ -88,6 +91,11 @@ static void virtio_blk_rw_complete(void *opaque, int ret) > >> trace_virtio_blk_rw_complete(req, ret); > >>+#ifdef DEBUG_MULTIREQ > >>+ printf("virtio_blk_rw_complete p %p ret %d\n", > >>+ req, ret); > >>+#endif > >In the final version, please use tracepoints instead of printfs. > > of course. This is just my debug code. > > > > >> if (ret) { > >> int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type); > >> bool is_read = !(p & VIRTIO_BLK_T_OUT); > >>@@ -257,24 +265,63 @@ static void virtio_blk_handle_scsi(VirtIOBlockReq > >>*req) > >> virtio_blk_free_request(req); > >> } > >>-void virtio_submit_multiwrite(BlockBackend *blk, MultiReqBuffer *mrb) > >>+static void virtio_multireq_cb(void *opaque, int ret) > >>+{ > >>+ MultiReqBuffer *mrb = opaque; > >>+ int i; > >>+#ifdef DEBUG_MULTIREQ > >>+ printf("virtio_multireq_cb: p %p sector_num %ld nb_sectors %d is_write > >>%d num_reqs %d\n", > >>+ mrb, mrb->sector_num, mrb->nb_sectors, mrb->is_write, > >>mrb->num_reqs); > >>+#endif > >>+ for (i = 0; i < mrb->num_reqs; i++) { > >>+ virtio_blk_rw_complete(mrb->reqs[i], ret); > >>+ } > >>+ > >>+ qemu_iovec_destroy(&mrb->qiov); > >>+ g_free(mrb); > >>+} > >>+ > >>+void virtio_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb0) > >> { > >>- int i, ret; > >>+ MultiReqBuffer *mrb = NULL; > >>- if (!mrb->num_writes) { > >>+ if (!mrb0->num_reqs) { > >> return; > >> } > >>- ret = blk_aio_multiwrite(blk, mrb->blkreq, mrb->num_writes); > >>- if (ret != 0) { > >>- for (i = 0; i < mrb->num_writes; i++) { > >>- if (mrb->blkreq[i].error) { > >>- virtio_blk_rw_complete(mrb->blkreq[i].opaque, -EIO); > >>- } > >>+ if (mrb0->num_reqs == 1) { > >>+ if (mrb0->is_write) { > >>+ blk_aio_writev(blk, mrb0->sector_num, &mrb0->reqs[0]->qiov, > >>mrb0->nb_sectors, > >>+ virtio_blk_rw_complete, mrb0->reqs[0]); > >>+ } else { > >>+ blk_aio_readv(blk, mrb0->sector_num, &mrb0->reqs[0]->qiov, > >>mrb0->nb_sectors, > >>+ virtio_blk_rw_complete, mrb0->reqs[0]); > >> } > >>+ qemu_iovec_destroy(&mrb0->qiov); > >>+ mrb0->num_reqs = 0; > >>+ return; > >>+ } > >>+ > >>+ mrb = g_malloc(sizeof(MultiReqBuffer)); > >>+ memcpy(mrb, mrb0, sizeof(MultiReqBuffer)); > >>+ mrb0->num_reqs = 0; > >We probably want to avoid allocations as much as we can. If at all > >possible, we don't want to memcpy() either. > > > >Most parts of MultiReqBuffer don't seem to be actually needed during the > >request and in the callback any more. Essentially, all that is needed is > >the qiov and a way to find all the requests that belong to the same > >MultiReqBuffer, so that they can all be completed. > > > >We could create a linked list by having a VirtIOBlockReq *next in the > >request struct. For the qiov we can probably just use the field in the > >first request (i.e. we extend the qiov of the first request). > > Thats a good idea and should work. I will look into this. > But where would you store the merge qiov? > > It looks like I would need a small struct which holds the qiov and the > pointer to req[0]. > This can then be used as opaque value to the callback. My thought was that you replace the original qiov of the first request with the merged one. I don't think you need the original one any more after merging the requests. Alternatively, you could use a separate field in VirtIOBlockReq which stays unused in all but the first request. In any case the goal is to get rid of a dynamically allocated struct, no matter how small, so opaque must be directly pointing to a request in the end. > >With these requirements removed, MultiReqBuffer can live on the stack > >and doesn't need to be copied to the heap. > > > > > >With the additional patch that you send me, the next few lines read: > > > >+ qemu_iovec_init(&mrb->qiov, mrb->num_reqs); > >+ > >+ for (i = 0; i < mrb->num_reqs; i++) { > >+ qemu_iovec_concat(&mrb->qiov, &mrb->reqs[i]->qiov, 0, > >mrb->reqs[i]->qiov.size); > >+ } > >+ assert(mrb->nb_sectors == mrb->qiov.size / BDRV_SECTOR_SIZE); > > > >The long line is more than 80 characters, but probably more importantly, > >mrb->num_reqs is a really bad estimation for niov. If the guest uses any > >vectored I/O, we're sure to get reallocations. What you really want to > >use here is the old mrb->qiov.niov (which is an abuse to store in the > >qiov, see below). > > Of course, my fault. I was so happy to have an exact value for the number > of niov that I put in the wrong variable ;-) Happens. :-) > >(I also wonder whether it would make sense to keep qiovs allocated when > >putting VirtIOBlockReqs into the pool, but that's not for this series.) > > > >>+ > >>+#ifdef DEBUG_MULTIREQ > >>+ printf("virtio_submit_multireq: p %p sector_num %ld nb_sectors %d > >>is_write %d num_reqs %d\n", > >>+ mrb, mrb->sector_num, mrb->nb_sectors, mrb->is_write, > >>mrb->num_reqs); > >>+#endif > >>+ > >>+ if (mrb->is_write) { > >>+ blk_aio_writev(blk, mrb->sector_num, &mrb->qiov, mrb->nb_sectors, > >>+ virtio_multireq_cb, mrb); > >>+ } else { > >>+ blk_aio_readv(blk, mrb->sector_num, &mrb->qiov, mrb->nb_sectors, > >>+ virtio_multireq_cb, mrb); > >> } > >>- mrb->num_writes = 0; > >>+ block_acct_merge_done(blk_get_stats(blk), > >>+ mrb->is_write ? BLOCK_ACCT_WRITE : > >>BLOCK_ACCT_READ, > >>+ mrb->num_reqs - 1); > >> } > >> static void virtio_blk_handle_flush(VirtIOBlockReq *req, MultiReqBuffer > >> *mrb) Kevin