On Mon, Apr 22, 2019 at 09:21:53PM -0700, Wei Li wrote: > 2. kvm_stat or perf record -a -e kvm:\* counters for vmexits and > interrupt injections. If these counters vary greatly between queue > sizes, then that is usually a clue. It's possible to get higher > performance by spending more CPU cycles although your system > doesn't > have many CPUs available, so I'm not sure if this is the case. > > [wei]: vmexits looks like a reason. I am using FIO tool to read/write block > storage via following sample command, interesting thing is that kvm:kvm_exit > count decreased from 846K to 395K after I increased num_queues from 2 to 4 > while the vCPU count is 2. > 1). Does this mean using more queues than vCPU count may increase > IOPS via spending more CPU cycle? > 2). Could you please help me better understand how more queues is > able to spend more CPU cycle? Thanks! > FIO command: fio --filename=/dev/sdb --direct=1 --rw=randrw > --bs=4k --ioengine=libaio --iodepth=64 --numjobs=4 --time_based > --group_reporting --name=iops --runtime=60 --eta-newline=1 > > 3. Power management and polling (kvm.ko halt_poll_ns, tuned profiles, > and QEMU iothread poll-max-ns). It's expensive to wake a CPU when > it > goes into a low power mode due to idle. There are several features > that can keep the CPU awake or even poll so that request latency is > reduced. The reason why the number of queues may matter is that > kicking multiple queues may keep the CPU awake more than batching > multiple requests onto a small number of queues. > [wei]: CPU awake could be another reason, I noticed that kvm:kvm_vcpu_wakeup > count decreased from 151K to 47K after I increased num_queues from 2 to 4 > while the vCPU count is 2.
This suggests that wakeups are involved in the performance difference. > 1). Does this mean more queues may keep CPU more busy and awake > which reduced the vcpu wakeup time? Yes, although it depends on how I/O requests are distributed across the queues. You can check /proc/interrupts inside the guest to see interrupt counts for the virtqueues. > 2). If using more num_queues than vCPU count is able to get higher > IOPS for this case, is it safe to use 4 queues while it only have 2 vCPU, or > there is any concern or impact by using more queues than vCPU count which I > should keep in mind? 2 vs 4 queues should be functionally identical. The only difference is performance. > In addition, does Virtio-scsi support Batch I/O Submission feature which may > be able to increase the IOPS via reducing the number of system calls? I don't see obvious batching support in drivers/scsi/virtio_scsi.c. The Linux block layer supports batching but I'm not sure if the SCSI layer does. Stefan
signature.asc
Description: PGP signature