On Mon, Mar 24, 2025 at 04:05:09PM +0800, saz97 wrote:
> This patch series refactors QEMU's FUSE export module to leverage coroutines 
> for read/write operations,
> addressing concurrency limitations and aligning with QEMU's asynchronous I/O 
> model. The changes
> demonstrate measurable performance improvements while simplifying resource 
> management.
> 
> 1. technology implementation
> 
>    according to Stefan suggerstion, i move the processing logic of 
> read_from_fuse_export into a coroutine for buffer management.
>    and change the fuse_getattr to call: bdrv_co_get_allocated_file_size().    
> 
> 2. performance summary
> 
>    For the coroutine_integration_fuse test, the average results for iodepth=1 
> and iodepth=64 are as follows:
>     -------------------------------  
>     Average results for iodepth=1:
>     Read_IOPS: coroutine_integration_fuse: 4492.88 | origin: 4309.39 | 4.25% 
> improvement
>     Write_IOPS: coroutine_integration_fuse: 4500.68 | origin: 4318.68 | 4.21% 
> improvement
>     Read_BW: coroutine_integration_fuse: 17971.00 KB/s | origin: 17237.30 
> KB/s | 4.26% improvement
>     Write_BW: coroutine_integration_fuse: 18002.50 KB/s | origin: 17274.30 
> KB/s | 4.23% improvement
>     --------------------------------
>     -------------------------------
>     Average results for iodepth=64:
>     Read_IOPS: coroutine_integration_fuse: 5576.93 | origin: 5347.13 | 4.29% 
> improvement
>     Write_IOPS: coroutine_integration_fuse: 5569.55 | origin: 5337.33 | 4.33% 
> improvement
>     Read_BW: coroutine_integration_fuse: 22311.40 KB/s | origin: 21392.20 
> KB/s | 4.31% improvement
>     Write_BW: coroutine_integration_fuse: 22282.20 KB/s | origin: 21353.20 
> KB/s | 4.34% improvement
>     --------------------------------
>    Although all metrics show improvements, the gains are concentrated in the 
> 4.2%–4.3% range, which is lower than expected. Further investigation using 
> gprof reveals the reasons for this limited improvement.
> 
> 3. Performance Bottlenecks Identified via gprof
>    After running a fio test with the following command:
>    fio --ioengine=io_uring --numjobs=1 --runtime=30 --ramp_time=5 \
>     --rw=randrw --bs=4k --time_based=1 --name=job1 \
>     --filename=/mnt/qemu-fuse --iopath=64
>    and analyzing the execution profile using gprof, the following issues were 
> identified:
> 
>    3.1 Increased Overall Execution Time
>    In the original implementation, fuse_write + blk_pwrite accounted for 8.7% 
> of total execution time (6.0% + 2.7%).
>    After refactoring, fuse_write_coroutine + blk_co_pwrite now accounts for 
> 43.1% (22.9% + 20.2%).
>    This suggests that coroutine overhead is contributing significantly to 
> execution time.
> 
>    3.2 Increased Read and Write Calls
>    fuse_write calls increased from 173,400 → 333,232.
>    fuse_read calls increased from 173,526 → 332,931.
>    This indicates that the coroutine-based approach is introducing redundant 
> I/O calls, likely due to unnecessary coroutine switches.
> 
>    3.3 Significant Coroutine Overhead
>    qemu_coroutine_enter is now called 1,572,803 times, compared to ~476,057 
> previously.
>    This frequent coroutine switching introduces unnecessary overhead, 
> limiting the expected performance improvements.

Due to the remaining performance issues, let's leave this contribution
task here.

Please focus on submitting your Google Summer of Code application at
https://summerofcode.withgoogle.com/ by April 8th.

Thanks,
Stefan

> 
> saz97 (1):
>   Integration coroutines into fuse export
> 
>  block/export/fuse.c | 190 +++++++++++++++++++++++++++++---------------
>  1 file changed, 126 insertions(+), 64 deletions(-)
> 
> -- 
> 2.34.1
> 

Attachment: signature.asc
Description: PGP signature

Reply via email to