Hello, Set the cache=none in virtiofsd and direct=1 in fio, here are the results and kvm-exit count in 5 seconds.
--thread-pool-size=64 (default) seq read: 307 MB/s (kvm-exit count=1076463) seq write: 430 MB/s (kvm-exit count=1302493) rand 4KB read: 65.2k IOPS (kvm-exit count=1322899) rand 4KB write: 97.2k IOPS (kvm-exit count=1568618) --thread-pool-size=1 seq read: 303 MB/s (kvm-exit count=1034614) seq write: 358 MB/s. (kvm-exit count=1537735) rand 4KB read: 7995 IOPS (kvm-exit count=438348) rand 4KB write: 97.7k IOPS (kvm-exit count=1907585) The thread-pool-size=64 improves the rand 4KB read performance largely, but doesn't increases the kvm-exit count too much. In addition, the fio avg. clat of rand 4K write are 960us for thread-pool-size=64 and 7700us for thread-pool-size=1. Regards, Derek Stefan Hajnoczi <stefa...@redhat.com> 於 2020年7月28日 週二 下午9:49寫道: > > > I'm trying and testing the virtio-fs feature in QEMU v5.0.0. > > My host and guest OS are both ubuntu 18.04 with kernel 5.4, and the > > underlying storage is one single SSD. > > > > The configuations are: > > (1) virtiofsd > > ./virtiofsd -o > > source=/mnt/ssd/virtiofs,cache=auto,flock,posix_lock,writeback,xattr > > --thread-pool-size=1 --socket-path=/tmp/vhostqemu > > > > (2) qemu > > qemu-system-x86_64 \ > > -enable-kvm \ > > -name ubuntu \ > > -cpu Westmere \ > > -m 4096 \ > > -global kvm-apic.vapic=false \ > > -netdev > > tap,id=hn0,vhost=off,br=br0,helper=/usr/local/libexec/qemu-bridge-helper > > \ > > -device e1000,id=e0,netdev=hn0 \ > > -blockdev '{"node-name": "disk0", "driver": "qcow2", > > "refcount-cache-size": 1638400, "l2-cache-size": 6553600, "file": { > > "driver": "file", "filename": "'${imagefolder}\/ubuntu.qcow2'"}}' \ > > -device virtio-blk,drive=disk0,id=disk0 \ > > -chardev socket,id=ch0,path=/tmp/vhostqemu \ > > -device vhost-user-fs-pci,chardev=ch0,tag=myfs \ > > -object memory-backend-memfd,id=mem,size=4G,share=on \ > > -numa node,memdev=mem \ > > -qmp stdio \ > > -vnc :0 > > > > (3) guest > > mount -t virtiofs myfs /mnt/virtiofs > > > > I tried to change virtiofsd's --thread-pool-size value and test the > > storage performance by fio. > > Before each read/write/randread/randwrite test, the pagecaches of > > guest and host are dropped. > > > > ``` > > RW="read" # or write/randread/randwrite > > fio --name=test --rw=$RW --bs=4k --numjobs=1 --ioengine=libaio > > --runtime=60 --direct=0 --iodepth=64 --size=10g > > --filename=/mnt/virtiofs/testfile > > done > > ``` > > > > --thread-pool-size=64 (default) > > seq read: 305 MB/s > > seq write: 118 MB/s > > rand 4KB read: 2222 IOPS > > rand 4KB write: 21100 IOPS > > > > --thread-pool-size=1 > > seq read: 387 MB/s > > seq write: 160 MB/s > > rand 4KB read: 2622 IOPS > > rand 4KB write: 30400 IOPS > > > > The results show the performance using default-pool-size (64) is > > poorer than using single thread. > > Is it due to the lock contention of the multiple threads? > > When can virtio-fs get better performance using multiple threads? > > > > > > I also tested the performance that guest accesses host's files via > > NFSv4/CIFS network filesystem. > > The "seq read" and "randread" performance of virtio-fs are also worse > > than the NFSv4 and CIFS. > > > > NFSv4: > > seq write: 244 MB/s > > rand 4K read: 4086 IOPS > > > > I cannot figure out why the perf of NFSv4/CIFS with the network stack > > is better than virtio-fs. > > Is it expected? Or, do I have an incorrect configuration? > > No, I remember benchmarking the thread pool and did not see such a big > difference. > > Please use direct=1 so that each I/O results in a virtio-fs request. > Otherwise the I/O pattern is not directly controlled by the benchmark > but by the page cache (readahead, etc). > > Using numactl(8) or taskset(1) to launch virtiofsd allows you to control > NUMA and CPU scheduling properties. For example, you could force all 64 > threads to run on the same host CPU using taskset to see if that helps > this I/O bound workload. > > fio can collect detailed statistics on queue depths and a latency > histogram. It would be interesting to compare the --thread-pool-size=64 > and --thread-pool-size=1 numbers. > > Comparing the "perf record -e kvm:kvm_exit" counts between the two might > also be interesting. > > Stefan