On Mon, Mar 24, 2025 at 04:05:09PM +0800, saz97 wrote: > This patch series refactors QEMU's FUSE export module to leverage coroutines > for read/write operations, > addressing concurrency limitations and aligning with QEMU's asynchronous I/O > model. The changes > demonstrate measurable performance improvements while simplifying resource > management. > > 1. technology implementation > > according to Stefan suggerstion, i move the processing logic of > read_from_fuse_export into a coroutine for buffer management. > and change the fuse_getattr to call: bdrv_co_get_allocated_file_size(). > > 2. performance summary > > For the coroutine_integration_fuse test, the average results for iodepth=1 > and iodepth=64 are as follows: > ------------------------------- > Average results for iodepth=1: > Read_IOPS: coroutine_integration_fuse: 4492.88 | origin: 4309.39 | 4.25% > improvement > Write_IOPS: coroutine_integration_fuse: 4500.68 | origin: 4318.68 | 4.21% > improvement > Read_BW: coroutine_integration_fuse: 17971.00 KB/s | origin: 17237.30 > KB/s | 4.26% improvement > Write_BW: coroutine_integration_fuse: 18002.50 KB/s | origin: 17274.30 > KB/s | 4.23% improvement > -------------------------------- > ------------------------------- > Average results for iodepth=64: > Read_IOPS: coroutine_integration_fuse: 5576.93 | origin: 5347.13 | 4.29% > improvement > Write_IOPS: coroutine_integration_fuse: 5569.55 | origin: 5337.33 | 4.33% > improvement > Read_BW: coroutine_integration_fuse: 22311.40 KB/s | origin: 21392.20 > KB/s | 4.31% improvement > Write_BW: coroutine_integration_fuse: 22282.20 KB/s | origin: 21353.20 > KB/s | 4.34% improvement > -------------------------------- > Although all metrics show improvements, the gains are concentrated in the > 4.2%–4.3% range, which is lower than expected. Further investigation using > gprof reveals the reasons for this limited improvement. > > 3. Performance Bottlenecks Identified via gprof > After running a fio test with the following command: > fio --ioengine=io_uring --numjobs=1 --runtime=30 --ramp_time=5 \ > --rw=randrw --bs=4k --time_based=1 --name=job1 \ > --filename=/mnt/qemu-fuse --iopath=64 > and analyzing the execution profile using gprof, the following issues were > identified: > > 3.1 Increased Overall Execution Time > In the original implementation, fuse_write + blk_pwrite accounted for 8.7% > of total execution time (6.0% + 2.7%). > After refactoring, fuse_write_coroutine + blk_co_pwrite now accounts for > 43.1% (22.9% + 20.2%). > This suggests that coroutine overhead is contributing significantly to > execution time. > > 3.2 Increased Read and Write Calls > fuse_write calls increased from 173,400 → 333,232. > fuse_read calls increased from 173,526 → 332,931. > This indicates that the coroutine-based approach is introducing redundant > I/O calls, likely due to unnecessary coroutine switches. > > 3.3 Significant Coroutine Overhead > qemu_coroutine_enter is now called 1,572,803 times, compared to ~476,057 > previously. > This frequent coroutine switching introduces unnecessary overhead, > limiting the expected performance improvements.
Due to the remaining performance issues, let's leave this contribution task here. Please focus on submitting your Google Summer of Code application at https://summerofcode.withgoogle.com/ by April 8th. Thanks, Stefan > > saz97 (1): > Integration coroutines into fuse export > > block/export/fuse.c | 190 +++++++++++++++++++++++++++++--------------- > 1 file changed, 126 insertions(+), 64 deletions(-) > > -- > 2.34.1 >
signature.asc
Description: PGP signature