On Sun, Mar 16, 2025 at 01:30:06AM +0800, saz97 wrote: > Signed-off-by: Changzhi Xie <s...@qq.com> > > FUSE Export Coroutine Integration Cover Letter > > This patch series refactors QEMU's FUSE export module to leverage coroutines > for read/write operations, > addressing concurrency limitations and aligning with QEMU's asynchronous I/O > model. The changes > demonstrate measurable performance improvements while simplifying resource > management. > > 1. Technical Implementation > Key modifications address prior review feedback (Stefan Hajnoczi) and > optimize execution flow: > > 1.1 Coroutine Integration > Convert fuse_read()/fuse_write() to launch coroutines (fuse_*_coroutine) > Utilize non-blocking blk_co_pread()/blk_co_pwrite() for block layer access > Eliminate main loop blocking during heavy I/O workloads > > 1.2 Buffer Management > Removed explicit buffer pre-allocation in read_from_fuse_export() > Replaced fuse_buf_free() with g_free() due to libfuse3 API constraints > > 1.3 Resource Lifecycle > Moved in_flight decrement and blk_exp_unref() into coroutines > Added FUSE opcode checks (FUSE_READ/FUSE_WRITE) to prevent premature cleanup > > 1.4 Structural Improvements > Simplified FuseIORequest structure: > Removed redundant fuse_ino_t and fuse_file_info fields > Retained minimal parameter passing requirements > > 2. Performance Validation > Tested using fio with 4K random RW pattern, and the result is the average of > 5 runs: > fio --ioengine=io_uring --numjobs=1 --runtime=30 --ramp_time=5 --rw=randrw > --bs=4k --time_based=1 > > Key Results > > Metric iodepth=1 iodepth=64 > Read Latency ▼ 2.7% (3.8k→3kns) ▼ 1.3% (4.7M→4.6M ns) > Write Latency ▼ 3.6% (112k→108kns) ▼ 2.8% (5.2M→5.0M ns) > Read IOPS 4740 → 4729 (±0.2%) ▲ 2.1% (6391→6529) > Write IOPS 4738 → 4727 (±0.2%) ▲ 2.2% (6390→6529) > Throughput ~18.9 GB/s (stable) ▲ 2.1% (25.6→26.1 GB/s)
Are you sure throughput is GB/s instead of MB/s? iodepth=1 read 4729 IOPS * bs=4k = 18 MB/s Also, fio was configured with --rw=randrw, so the total throughput should be read throughput + write throughput. Based on the read and write IOPS numbers, the total throughput should be ~36 MB/s. Which throughput number are you showing? > > Analysis > > High Concurrency (iodepth=64): > Sustained throughput gains (+2.1-2.2%) demonstrate improved scalability > Latency reductions confirm reduced contention in concurrent operations This is surprising. Before this patch series the FUSE export code only submits 1 request at a time, so the iodepth=64 results should be only a little better than the iodepth=1 results. After this patch series the FUSE export code should be submitting all 64 requests concurrently and improving performance by more than 2%. Why was the improvement only 2%? > > saz97 (1): > Integration coroutines into fuse export > > block/export/fuse.c | 189 +++++++++++++++++++++++++++++++------------- > 1 file changed, 132 insertions(+), 57 deletions(-) > > -- > 2.34.1 >
signature.asc
Description: PGP signature