> On Dec 18, 2017, at 6:38 AM, Stefan Hajnoczi <stefa...@redhat.com> wrote: > > On Fri, Dec 15, 2017 at 06:02:50PM +0300, Denis V. Lunev wrote: >> Linux guests submit IO requests no longer than PAGE_SIZE * max_seg >> field reported by SCSI controler. Thus typical sequential read with >> 1 MB size results in the following pattern of the IO from the guest: >> 8,16 1 15754 2.766095122 2071 D R 2095104 + 1008 [dd] >> 8,16 1 15755 2.766108785 2071 D R 2096112 + 1008 [dd] >> 8,16 1 15756 2.766113486 2071 D R 2097120 + 32 [dd] >> 8,16 1 15757 2.767668961 0 C R 2095104 + 1008 [0] >> 8,16 1 15758 2.768534315 0 C R 2096112 + 1008 [0] >> 8,16 1 15759 2.768539782 0 C R 2097120 + 32 [0] >> The IO was generated by >> dd if=/dev/sda of=/dev/null bs=1024 iflag=direct >> >> This effectively means that on rotational disks we will observe 3 IOPS >> for each 2 MBs processed. This definitely negatively affects both >> guest and host IO performance. >> >> The cure is relatively simple - we should report lengthy scatter-gather >> ability of the SCSI controller. Fortunately the situation here is very >> good. VirtIO transport layer can accomodate 1024 items in one request >> while we are using only 128. This situation is present since almost >> very beginning. 2 items are dedicated for request metadata thus we >> should publish VIRTQUEUE_MAX_SIZE - 2 as max_seg. >> >> The following pattern is observed after the patch: >> 8,16 1 9921 2.662721340 2063 D R 2095104 + 1024 [dd] >> 8,16 1 9922 2.662737585 2063 D R 2096128 + 1024 [dd] >> 8,16 1 9923 2.665188167 0 C R 2095104 + 1024 [0] >> 8,16 1 9924 2.665198777 0 C R 2096128 + 1024 [0] >> which is much better. >> >> The dark side of this patch is that we are tweaking guest visible >> parameter, though this should be relatively safe as above transport >> layer support is present in QEMU/host Linux for a very long time. >> The patch adds configurable property for VirtIO SCSI with a new default >> and hardcode option for VirtBlock which does not provide good >> configurable framework. >> >> Signed-off-by: Denis V. Lunev <d...@openvz.org> >> CC: "Michael S. Tsirkin" <m...@redhat.com> >> CC: Stefan Hajnoczi <stefa...@redhat.com> >> CC: Kevin Wolf <kw...@redhat.com> >> CC: Max Reitz <mre...@redhat.com> >> CC: Paolo Bonzini <pbonz...@redhat.com> >> CC: Richard Henderson <r...@twiddle.net> >> CC: Eduardo Habkost <ehabk...@redhat.com> >> --- >> include/hw/compat.h | 17 +++++++++++++++++ >> include/hw/virtio/virtio-blk.h | 1 + >> include/hw/virtio/virtio-scsi.h | 1 + >> hw/block/virtio-blk.c | 4 +++- >> hw/scsi/vhost-scsi.c | 2 ++ >> hw/scsi/vhost-user-scsi.c | 2 ++ >> hw/scsi/virtio-scsi.c | 4 +++- >> 7 files changed, 29 insertions(+), 2 deletions(-) >> >> diff --git a/include/hw/compat.h b/include/hw/compat.h >> index 026fee9..b9be5d7 100644 >> --- a/include/hw/compat.h >> +++ b/include/hw/compat.h >> @@ -2,6 +2,23 @@ >> #define HW_COMPAT_H >> >> #define HW_COMPAT_2_11 \ >> + {\ >> + .driver = "virtio-blk-device",\ >> + .property = "max_segments",\ >> + .value = "126",\ >> + },{\ >> + .driver = "vhost-scsi",\ >> + .property = "max_segments",\ >> + .value = "126",\ >> + },{\ >> + .driver = "vhost-user-scsi",\ >> + .property = "max_segments",\ >> + .value = "126",\ > > Existing vhost-user-scsi slave programs might not expect up to 1022 > segments. Hopefully we can get away with this change since there are > relatively few vhost-user-scsi slave programs. > > CCed Felipe (Nutanix) and Jim (SPDK) in case they have comments.
SPDK vhost-user targets only expect max 128 segments. They also pre-allocate I/O task structures when QEMU connects to the vhost-user device. Supporting up to 1022 segments would result in significantly higher memory usage, reduction in I/O queue depth processed by the vhost-user target, or having to dynamically allocate I/O task structures - none of which are ideal. What if this was just bumped from 126 to 128? I guess I’m trying to understand the level of guest and host I/O performance that is gained with this patch. One I/O per 512KB vs. one I/O per 4MB - we are still only talking about a few hundred IO/s difference. -Jim