Hi Brian,

sorry for my late reply, just back from vacation and fighting through
my mails.

On 8/4/25 01:33, Brian Song wrote:
> 
> 
> On 2025-08-01 12:09 p.m., Brian Song wrote:
>> Hi Bernd,
>>
>> We are currently working on implementing termination support for fuse- 
>> over-io_uring in QEMU, and right now we are focusing on how to clean up 
>> in-flight SQEs properly. Our main question is about how well the kernel 
>> supports robust cancellation for these fuse-over-io_uring SQEs. Does it 
>> actually implement cancellation beyond destroying the io_uring queue?
>>
>> In QEMU FUSE export, we need a way to quickly and cleanly detach from 
>> the event loop and cancel any pending SQEs when an export is no longer 
>> in use. Ideally, we want to avoid the more drastic measure of having to 
>> close the entire /dev/fuse fd just to gracefully terminate outstanding 
>> operations.
>>
>> We are not sure if there's an existing code path that supports async 
>> cancel for these in-flight SQEs in the fuse-over-io_uring setup, or if 
>> additional callbacks might be needed to fully integrate with the 
>> kernel's async cancel mechanism. We also realized libfuse manages 
>> shutdowns differently, typically by signaling a thread via eventfd 
>> rather than relying on async cancel.
>>
>> Would love to hear your thoughts or suggestions on this!
>>
>> Thanks,
>> Brian
> 
> I looked into the kernel codebase and came up with some initial ideas, 
> which might not be entirely accurate:
> 
> The IORING_OP_ASYNC_CANCEL operation can only cancel io_uring ring 
> resources and a limited set of request types. It does not clean up 
> resources related to fuse-over-io_uring, such as in-use entries.
> IORING_OP_ASYNC_CANCEL
> -> submit/enter
> -> io_uring/opdef.c:: .issue = io_async_cancel,
>       -> __io_async_cancel
>               -> io_try_cancel ==> Can only cancel few types of requests
> 
> 
> Currently, full cleanup of both io_uring and FUSE data structures for 
> fuse-over-io_uring only happens in two cases:  [since we have mark these 
> SQEs cancelable when we commit_and_fetch everytime(mentioned below)]
> 1.When the FUSE daemon exits (exit syscall)
> 2.During execve, which triggers the kernel path:
> 
> io_uring_files_cancel =>
> io_uring_try_cancel_uring_cmd =>
> file->f_op->uring_cmd(cmd, IO_URING_F_CANCEL | IO_URING_F_COMPLETE_DEFER)
> 
> 
> 
> Below is a state diagram (mermaid graph) of a fuse_uring entry inside 
> the kernel:
> 
> graph TD
>      A["Userspace daemon"] --> 
> B["FUSE_IO_URING_CMD_REGISTER<br/>Register buffer"]
>      B --> C["Create fuse_ring_ent"]
>      C --> D["State: FRRS_AVAILABLE<br/>Added to ent_avail_queue"]
> 
>      E["FUSE filesystem operation"] --> F["Generate FUSE request"]
>      F --> G["fuse_uring_queue_fuse_req()"]
>      G --> H{"Check ent_avail_queue"}
> 
>      H -->|Entry available| I["Take entry from queue<br/>Assign to FUSE 
> request"]
>      H -->|No entry available| J["Request goes to fuse_req_queue and waits"]
> 
>      I --> K["fuse_uring_dispatch_ent()"]
>      K --> L["State: FRRS_USERSPACE<br/>Move to ent_in_userspace"]
>      L --> M["Notify userspace to process"]
> 
>      N["Process exit / daemon termination"] --> 
> O["io_uring_try_cancel_uring_cmd() <br/> >> NOTE Since we marked the 
> entry IORING_URING_CMD_CANCELABLE <br/> in the previous fuse_uring_cmd , 
> try_cancel_uring_cmd will call <br/> fuse_uring_cmd to 'delete' it <<"]
>      O --> P["fuse_uring_cancel()"]
>      P --> Q{"Is entry state AVAILABLE?"}
> 
>      Q -->|Yes| R[">> equivalent to 'delete' << Directly change to 
> USERSPACE<br/>Move to ent_in_userspace"]
>      Q -->|No| S["Do nothing"]
> 
>      R --> T["io_uring_cmd_done(-ENOTCONN)"]
>      T --> U["Entry is 'disguised' as completed<br/>Will no longer 
> handle new FUSE requests"]
> 
>      V["Practical effects of cancellation:"] --> W["1. Prevent new FUSE 
> requests from using this entry<br/>2. Release io_uring command 
> resources<br/>3. Does not affect already assigned FUSE requests"]
> 
> 
> 
> When the kernel is waiting for VFS requests and the corresponding entry 
> is idle, its state is FRRS_AVAILABLE. Once a request is handed off to 
> the userspace daemon, the entry's state transitions to FRRS_USERSPACE.
> 
> The fuse_uring_cmd function handles the COMMIT_AND_FETCH operation. If a 
> cmd call carries the IO_URING_F_CANCEL flag, fuse_uring_cancel is 
> invoked to mark the entry state as FRRS_USERSPACE, making it unavailable 
> for future requests from the VFS.
> 
> If the IORING_URING_CMD_CANCELABLE flag is not set, before committing 
> and fetching, we first call fuse_uring_prepare_cancel to mark the entry 
> as IORING_URING_CMD_CANCELABLE. This indicates that if the daemon exits 
> or an execve happens during fetch, the kernel can call 
> io_uring_try_cancel_uring_cmd to safely clean up these SQEs/CQEs and 
> related fuse resource.
> 
> Back to our previous issue, when deleting a FUSE export in QEMU, we hit 
> a crash due to an invalid CQE handler. This happened because the SQEs we 
> previously submitted hadn't returned yet by the time we shut down and 
> deleted the export.
> 
> To avoid this, we need to ensure that no further CQEs are returned and 
> no CQE handler is triggered. We need to either:
> 
> * Prevent any further user operations before calling blk_exp_close_all
> 
> or
> 
> * Require the userspace to trigger few specific operations that causes 
> the kernel to return all outstanding CQEs, and then the daemon can send 
> io_uring_cmd with the IO_URING_F_CANCEL flag to mark all entries as 
> unavailable (FRRS_USERSPACE) "delete operation", ensuring the kernel 
> won't assign them to future VFS requests.
> 
> 
> 

I have to admit that I'm confused why you can't use umount, isn't that
the most graceful way to shutdown a connection?

If you need another custom way for some reasons, we probably need
to add it.


Thanks,
Bernd

Reply via email to