On Thu, Aug 01, 2019 at 05:54:05PM +0100, Stefan Hajnoczi wrote: > Performance > ----------- > Please try these patches out and share your results.
Here are the performance numbers: Threadpool | iodepth | iodepth size | 1 | 64 -----------+---------+-------- None | 4451 | 4876 1 | 4360 | 4858 64 | 4359 | 33,266 A graph is available here: https://vmsplice.net/~stefan/virtiofsd-threadpool-performance.png Summary: * iodepth=64 performance is increased by 6.8 times. * iodepth=1 performance degrades by 2%. * DAX is bottlenecked by QEMU's single-threaded VHOST_USER_SLAVE_FS_MAP/UNMAP handler. Threadpool size "none" is virtiofsd commit 813a824b707 ("virtiofsd: use fuse_lowlevel_is_virtio() in fuse_session_destroy()") without any of the multithreading preparation patches. I benchmarked this to check whether the patches introduce a regression for iodepth=1. They do, but it's only around 2%. I also ran with DAX but found there was not much difference between iodepth=1 and iodepth=64. This might be because the host mmap(2) syscall becomes the bottleneck and a serialization point. QEMU only processes one VHOST_USER_SLAVE_FS_MAP/UNMAP at a time. If we want to accelerate DAX it may be necessary to parallelize mmap, assuming the host kernel can do them in parallel on a single file. This performance optimization is future work and not directly related to this patch series. The following fio job was run with cache=none and no DAX: [global] runtime=60 ramp_time=30 filename=/var/tmp/fio.dat direct=1 rw=randread bs=4k size=4G ioengine=libaio iodepth=1 [read] Guest configuration: 1 vCPU 4 GB RAM Linux 5.1 (vivek-aug-06-2019) Host configuration: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz (2 cores x 2 threads) 8 GB RAM Linux 5.1.20-300.fc30.x86_64 XFS + dm-thin + dm-crypt Toshiba THNSFJ256GDNU (256 GB SATA SSD) Stefan
signature.asc
Description: PGP signature