On Sun, Mar 16, 2025 at 01:30:06AM +0800, saz97 wrote:
> Signed-off-by: Changzhi Xie <s...@qq.com>
> 
> FUSE Export Coroutine Integration Cover Letter
> 
> This patch series refactors QEMU's FUSE export module to leverage coroutines 
> for read/write operations, 
> addressing concurrency limitations and aligning with QEMU's asynchronous I/O 
> model. The changes 
> demonstrate measurable performance improvements while simplifying resource 
> management.
> 
> 1. Technical Implementation
> Key modifications address prior review feedback (Stefan Hajnoczi) and 
> optimize execution flow:
> 
> ​1.1 Coroutine Integration
> Convert fuse_read()/fuse_write() to launch coroutines (fuse_*_coroutine)
> Utilize non-blocking blk_co_pread()/blk_co_pwrite() for block layer access
> Eliminate main loop blocking during heavy I/O workloads
> 
> 1.2 ​Buffer Management
> Removed explicit buffer pre-allocation in read_from_fuse_export()
> Replaced fuse_buf_free() with g_free() due to libfuse3 API constraints
> 
> ​1.3 Resource Lifecycle
> Moved in_flight decrement and blk_exp_unref() into coroutines
> Added FUSE opcode checks (FUSE_READ/FUSE_WRITE) to prevent premature cleanup
> 
> 1.4 ​Structural Improvements
> Simplified FuseIORequest structure:
> Removed redundant fuse_ino_t and fuse_file_info fields
> Retained minimal parameter passing requirements
> 
> 2. Performance Validation
> Tested using fio with 4K random RW pattern, and the result is the average of 
> 5 runs:
> fio --ioengine=io_uring --numjobs=1 --runtime=30 --ramp_time=5 --rw=randrw 
> --bs=4k --time_based=1
> 
> Key Results
> 
> Metric               iodepth=1                   iodepth=64
> ​Read Latency   ▼ 2.7% (3.8k→3kns)      ▼ 1.3% (4.7M→4.6M ns)
> ​Write Latency        ▼ 3.6% (112k→108kns)    ▼ 2.8% (5.2M→5.0M ns)
> ​Read IOPS        4740 → 4729 (±0.2%)   ▲ 2.1% (6391→6529)
> ​Write IOPS       4738 → 4727 (±0.2%)   ▲ 2.2% (6390→6529)
> ​Throughput       ~18.9 GB/s (stable)   ▲ 2.1% (25.6→26.1 GB/s)

Are you sure throughput is GB/s instead of MB/s?

iodepth=1 read 4729 IOPS * bs=4k = 18 MB/s

Also, fio was configured with --rw=randrw, so the total throughput
should be read throughput + write throughput. Based on the read and
write IOPS numbers, the total throughput should be ~36 MB/s. Which
throughput number are you showing?

> 
> Analysis
> 
> ​High Concurrency (iodepth=64):
> Sustained throughput gains (+2.1-2.2%) demonstrate improved scalability
> Latency reductions confirm reduced contention in concurrent operations

This is surprising. Before this patch series the FUSE export code only
submits 1 request at a time, so the iodepth=64 results should be only a
little better than the iodepth=1 results. After this patch series the
FUSE export code should be submitting all 64 requests concurrently and
improving performance by more than 2%.

Why was the improvement only 2%?

> 
> saz97 (1):
>   Integration coroutines into fuse export
> 
>  block/export/fuse.c | 189 +++++++++++++++++++++++++++++++-------------
>  1 file changed, 132 insertions(+), 57 deletions(-)
> 
> -- 
> 2.34.1
> 

Attachment: signature.asc
Description: PGP signature

Reply via email to