On Mon, Feb 03, 2020 at 12:39:49PM +0100, Sergio Lopez wrote: > On Mon, Feb 03, 2020 at 10:57:44AM +0000, Daniel P. Berrangé wrote: > > On Mon, Feb 03, 2020 at 11:25:29AM +0100, Sergio Lopez wrote: > > > On Thu, Jan 30, 2020 at 10:52:35AM +0000, Stefan Hajnoczi wrote: > > > > On Thu, Jan 30, 2020 at 01:29:16AM +0100, Paolo Bonzini wrote: > > > > > On 29/01/20 16:44, Stefan Hajnoczi wrote: > > > > > > On Mon, Jan 27, 2020 at 02:10:31PM +0100, Cornelia Huck wrote: > > > > > >> On Fri, 24 Jan 2020 10:01:57 +0000 > > > > > >> Stefan Hajnoczi <stefa...@redhat.com> wrote: > > > > > >>> @@ -47,10 +48,15 @@ static void > > > > > >>> vhost_scsi_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp) > > > > > >>> { > > > > > >>> VHostSCSIPCI *dev = VHOST_SCSI_PCI(vpci_dev); > > > > > >>> DeviceState *vdev = DEVICE(&dev->vdev); > > > > > >>> - VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev); > > > > > >>> + VirtIOSCSIConf *conf = &dev->vdev.parent_obj.parent_obj.conf; > > > > > >>> + > > > > > >>> + /* 1:1 vq to vcpu mapping is ideal because it avoids IPIs */ > > > > > >>> + if (conf->num_queues == VIRTIO_SCSI_AUTO_NUM_QUEUES) { > > > > > >>> + conf->num_queues = current_machine->smp.cpus; > > > > > >> This now maps the request vqs 1:1 to the vcpus. What about the > > > > > >> fixed > > > > > >> vqs? If they don't really matter, amend the comment to explain > > > > > >> that? > > > > > > The fixed vqs don't matter. They are typically not involved in the > > > > > > data > > > > > > path, only the control path where performance doesn't matter. > > > > > > > > > > Should we put a limit on the number of vCPUs? For anything above ~128 > > > > > the guest is probably not going to be disk or network bound. > > > > > > > > Michael Tsirkin pointed out there's a hard limit of VIRTIO_QUEUE_MAX > > > > (1024). We need to at least stay under that limit. > > > > > > > > Should the guest have >128 virtqueues? Each virtqueue requires guest > > > > RAM and 2 host eventfds. Eventually these resource requirements will > > > > become a scalability problem, but how do we choose a hard limit and what > > > > happens to guest performance above that limit? > > > > > > From the UX perspective, I think it's safer to use a rather low upper > > > limit for the automatic configuration. > > > > > > Users of large VMs (>=32 vCPUs) aiming for the optimal performance are > > > already facing the need of manually tuning (or relying on a software > > > to do that for them) other aspects of it, like vNUMA, IOThreads and > > > CPU pinning, so I don't think we should focus on this group. > > > > Whether they're runing manually, or relying on software to tune for > > them, we (QEMU maintainers) still need to provide credible guidance > > on what todo with tuning for large CPU counts. Without clear info > > from QEMU, it just descends into hearsay and guesswork, both of which > > approaches leave QEMU looking bad. > > I agree. Good documentation, ideally with some benchmarks, and safe > defaults sound like a good approach to me. > > > So I think we need to, at the very least, make a clear statement here > > about what tuning approach should be applied vCPU count gets high, > > and probably even apply that as a default out of the box approach. > > In general, I would agree, but in this particular case the > optimization has an impact on something outside's QEMU control (host's > resources), so we lack the information needed to make a proper guess. > > My main concern here is users upgrading QEMU to hit some kind of crash > or performance issue, without having touched their VM config. And > let's not forget that Stefan said in the cover that this amounts to a > 1-4% improvement on 4k operations on an SSD, and I guess that's with > iodepth=1. I suspect with a larger block size and/or higher iodepth > the improvement will be barely noticeable, which means it'll only have > a positive impact on users running DB/OLTP or similar workloads on > dedicated, directly attached, low-latency storage. > > But don't get me wrong, this is a *good* optimization. It's just I > think we should play safe here. > > Sergio.
Yea I think a bit more benchmarking than with 4 vcpus so at least we can see the trend can't hurt.