Hi Brian, sorry for my late reply, just back from vacation and fighting through my mails.
On 8/4/25 01:33, Brian Song wrote: > > > On 2025-08-01 12:09 p.m., Brian Song wrote: >> Hi Bernd, >> >> We are currently working on implementing termination support for fuse- >> over-io_uring in QEMU, and right now we are focusing on how to clean up >> in-flight SQEs properly. Our main question is about how well the kernel >> supports robust cancellation for these fuse-over-io_uring SQEs. Does it >> actually implement cancellation beyond destroying the io_uring queue? >> >> In QEMU FUSE export, we need a way to quickly and cleanly detach from >> the event loop and cancel any pending SQEs when an export is no longer >> in use. Ideally, we want to avoid the more drastic measure of having to >> close the entire /dev/fuse fd just to gracefully terminate outstanding >> operations. >> >> We are not sure if there's an existing code path that supports async >> cancel for these in-flight SQEs in the fuse-over-io_uring setup, or if >> additional callbacks might be needed to fully integrate with the >> kernel's async cancel mechanism. We also realized libfuse manages >> shutdowns differently, typically by signaling a thread via eventfd >> rather than relying on async cancel. >> >> Would love to hear your thoughts or suggestions on this! >> >> Thanks, >> Brian > > I looked into the kernel codebase and came up with some initial ideas, > which might not be entirely accurate: > > The IORING_OP_ASYNC_CANCEL operation can only cancel io_uring ring > resources and a limited set of request types. It does not clean up > resources related to fuse-over-io_uring, such as in-use entries. > IORING_OP_ASYNC_CANCEL > -> submit/enter > -> io_uring/opdef.c:: .issue = io_async_cancel, > -> __io_async_cancel > -> io_try_cancel ==> Can only cancel few types of requests > > > Currently, full cleanup of both io_uring and FUSE data structures for > fuse-over-io_uring only happens in two cases: [since we have mark these > SQEs cancelable when we commit_and_fetch everytime(mentioned below)] > 1.When the FUSE daemon exits (exit syscall) > 2.During execve, which triggers the kernel path: > > io_uring_files_cancel => > io_uring_try_cancel_uring_cmd => > file->f_op->uring_cmd(cmd, IO_URING_F_CANCEL | IO_URING_F_COMPLETE_DEFER) > > > > Below is a state diagram (mermaid graph) of a fuse_uring entry inside > the kernel: > > graph TD > A["Userspace daemon"] --> > B["FUSE_IO_URING_CMD_REGISTER<br/>Register buffer"] > B --> C["Create fuse_ring_ent"] > C --> D["State: FRRS_AVAILABLE<br/>Added to ent_avail_queue"] > > E["FUSE filesystem operation"] --> F["Generate FUSE request"] > F --> G["fuse_uring_queue_fuse_req()"] > G --> H{"Check ent_avail_queue"} > > H -->|Entry available| I["Take entry from queue<br/>Assign to FUSE > request"] > H -->|No entry available| J["Request goes to fuse_req_queue and waits"] > > I --> K["fuse_uring_dispatch_ent()"] > K --> L["State: FRRS_USERSPACE<br/>Move to ent_in_userspace"] > L --> M["Notify userspace to process"] > > N["Process exit / daemon termination"] --> > O["io_uring_try_cancel_uring_cmd() <br/> >> NOTE Since we marked the > entry IORING_URING_CMD_CANCELABLE <br/> in the previous fuse_uring_cmd , > try_cancel_uring_cmd will call <br/> fuse_uring_cmd to 'delete' it <<"] > O --> P["fuse_uring_cancel()"] > P --> Q{"Is entry state AVAILABLE?"} > > Q -->|Yes| R[">> equivalent to 'delete' << Directly change to > USERSPACE<br/>Move to ent_in_userspace"] > Q -->|No| S["Do nothing"] > > R --> T["io_uring_cmd_done(-ENOTCONN)"] > T --> U["Entry is 'disguised' as completed<br/>Will no longer > handle new FUSE requests"] > > V["Practical effects of cancellation:"] --> W["1. Prevent new FUSE > requests from using this entry<br/>2. Release io_uring command > resources<br/>3. Does not affect already assigned FUSE requests"] > > > > When the kernel is waiting for VFS requests and the corresponding entry > is idle, its state is FRRS_AVAILABLE. Once a request is handed off to > the userspace daemon, the entry's state transitions to FRRS_USERSPACE. > > The fuse_uring_cmd function handles the COMMIT_AND_FETCH operation. If a > cmd call carries the IO_URING_F_CANCEL flag, fuse_uring_cancel is > invoked to mark the entry state as FRRS_USERSPACE, making it unavailable > for future requests from the VFS. > > If the IORING_URING_CMD_CANCELABLE flag is not set, before committing > and fetching, we first call fuse_uring_prepare_cancel to mark the entry > as IORING_URING_CMD_CANCELABLE. This indicates that if the daemon exits > or an execve happens during fetch, the kernel can call > io_uring_try_cancel_uring_cmd to safely clean up these SQEs/CQEs and > related fuse resource. > > Back to our previous issue, when deleting a FUSE export in QEMU, we hit > a crash due to an invalid CQE handler. This happened because the SQEs we > previously submitted hadn't returned yet by the time we shut down and > deleted the export. > > To avoid this, we need to ensure that no further CQEs are returned and > no CQE handler is triggered. We need to either: > > * Prevent any further user operations before calling blk_exp_close_all > > or > > * Require the userspace to trigger few specific operations that causes > the kernel to return all outstanding CQEs, and then the daemon can send > io_uring_cmd with the IO_URING_F_CANCEL flag to mark all entries as > unavailable (FRRS_USERSPACE) "delete operation", ensuring the kernel > won't assign them to future VFS requests. > > > I have to admit that I'm confused why you can't use umount, isn't that the most graceful way to shutdown a connection? If you need another custom way for some reasons, we probably need to add it. Thanks, Bernd