This patch series refactors QEMU's FUSE export module to leverage coroutines 
for read/write operations,
addressing concurrency limitations and aligning with QEMU's asynchronous I/O 
model. The changes
demonstrate measurable performance improvements while simplifying resource 
management.

1. technology implementation

   according to Stefan suggerstion, i move the processing logic of 
read_from_fuse_export into a coroutine for buffer management.
   and change the fuse_getattr to call: bdrv_co_get_allocated_file_size().    

2. performance summary

   For the coroutine_integration_fuse test, the average results for iodepth=1 
and iodepth=64 are as follows:
    -------------------------------  
    Average results for iodepth=1:
    Read_IOPS: coroutine_integration_fuse: 4492.88 | origin: 4309.39 | 4.25% 
improvement
    Write_IOPS: coroutine_integration_fuse: 4500.68 | origin: 4318.68 | 4.21% 
improvement
    Read_BW: coroutine_integration_fuse: 17971.00 KB/s | origin: 17237.30 KB/s 
| 4.26% improvement
    Write_BW: coroutine_integration_fuse: 18002.50 KB/s | origin: 17274.30 KB/s 
| 4.23% improvement
    --------------------------------
    -------------------------------
    Average results for iodepth=64:
    Read_IOPS: coroutine_integration_fuse: 5576.93 | origin: 5347.13 | 4.29% 
improvement
    Write_IOPS: coroutine_integration_fuse: 5569.55 | origin: 5337.33 | 4.33% 
improvement
    Read_BW: coroutine_integration_fuse: 22311.40 KB/s | origin: 21392.20 KB/s 
| 4.31% improvement
    Write_BW: coroutine_integration_fuse: 22282.20 KB/s | origin: 21353.20 KB/s 
| 4.34% improvement
    --------------------------------
   Although all metrics show improvements, the gains are concentrated in the 
4.2%–4.3% range, which is lower than expected. Further investigation using 
gprof reveals the reasons for this limited improvement.

3. Performance Bottlenecks Identified via gprof
   After running a fio test with the following command:
   fio --ioengine=io_uring --numjobs=1 --runtime=30 --ramp_time=5 \
    --rw=randrw --bs=4k --time_based=1 --name=job1 \
    --filename=/mnt/qemu-fuse --iopath=64
   and analyzing the execution profile using gprof, the following issues were 
identified:

   3.1 Increased Overall Execution Time
   In the original implementation, fuse_write + blk_pwrite accounted for 8.7% 
of total execution time (6.0% + 2.7%).
   After refactoring, fuse_write_coroutine + blk_co_pwrite now accounts for 
43.1% (22.9% + 20.2%).
   This suggests that coroutine overhead is contributing significantly to 
execution time.

   3.2 Increased Read and Write Calls
   fuse_write calls increased from 173,400 → 333,232.
   fuse_read calls increased from 173,526 → 332,931.
   This indicates that the coroutine-based approach is introducing redundant 
I/O calls, likely due to unnecessary coroutine switches.

   3.3 Significant Coroutine Overhead
   qemu_coroutine_enter is now called 1,572,803 times, compared to ~476,057 
previously.
   This frequent coroutine switching introduces unnecessary overhead, limiting 
the expected performance improvements.

saz97 (1):
  Integration coroutines into fuse export

 block/export/fuse.c | 190 +++++++++++++++++++++++++++++---------------
 1 file changed, 126 insertions(+), 64 deletions(-)

-- 
2.34.1


Reply via email to