On Fri, Feb 20, 2026 at 4:32 PM Tejun Heo <[email protected]> wrote: > > Hello, > > On Thu, Feb 19, 2026 at 09:54:47PM -0800, T.J. Mercier wrote: > > Currently some kernfs files (e.g. cgroup.events, memory.events) support > > inotify watches for IN_MODIFY, but unlike with regular filesystems, they > > do not receive IN_DELETE_SELF or IN_IGNORED events when they are > > removed. This means inotify watches persist after file deletion until > > the process exits and the inotify file descriptor is cleaned up, or > > until inotify_rm_watch is called manually. > > > > This creates a problem for processes monitoring cgroups. For example, a > > service monitoring memory.events for memory.high breaches needs to know > > when a cgroup is removed to clean up its state. Where it's known that a > > cgroup is removed when all processes die, without IN_DELETE_SELF the > > service must resort to inefficient workarounds such as: > > 1) Periodically scanning procfs to detect process death (wastes CPU > > and is susceptible to PID reuse). > > 2) Holding a pidfd for every monitored cgroup (can exhaust file > > descriptors). > > > > This patch enables IN_DELETE_SELF and IN_IGNORED events for kernfs files > > and directories by clearing inode i_nlink values during removal. This > > allows VFS to make the necessary fsnotify calls so that userspace > > receives the inotify events. > > > > As a result, applications can rely on a single existing watch on a file > > of interest (e.g. memory.events) to receive notifications for both > > modifications and the eventual removal of the file, as well as automatic > > watch descriptor cleanup, simplifying userspace logic and improving > > efficiency. > > > > There is gap in this implementation for certain file removals due their > > unique nature in kernfs. Directory removals that trigger file removals > > occur through vfs_rmdir, which shrinks the dcache and emits fsnotify > > events after the rmdir operation; there is no issue here. However kernfs > > writes to particular files (e.g. cgroup.subtree_control) can also cause > > file removal, but vfs_write does not attempt to emit fsnotify events > > after the write operation, even if i_nlink counts are 0. As a usecase > > for monitoring this category of file removals is not known, they are > > left without having IN_DELETE or IN_DELETE_SELF events generated. > > Adding a comment with the above content would probably be useful. It also > might be worthwhile to note that fanotify recursive monitoring wouldn't work > reliably as cgroups can go away while inodes are not attached.
Sigh.. it's a shame to grow more weird semantics. But I take this back to the POV of "remote" vs. "local" vfs notifications. the IN_DELETE_SELF events added by this change are actually "local" vfs notifications. If we would want to support monitoring cgroups fs super block for all added/removed cgroups with fanotify, we would be able to implement this as "remote" notifications and in this case, adding explicit fsnotify() calls could make sense. Thanks, Amir.

