On 9/13/23, David Chisnall <thera...@freebsd.org> wrote: > On 12 Sep 2023, at 17:19, Bakul Shah <ba...@iitbombay.org> wrote: >> >> FreeBSD >> should add inotify. > > inotify is also probably not the right thing. If someone is interested in > adding this, Apple’s fsevents API is a better inspiration. It is carefully > designed to ensure that the things monitoring for events can’t ever block > filesystem operations from making progress.
I'm not sure what you mean here specifically and I don't see anything careful about what they did. >From userspace POV the API is allowed to drop events, which makes life easy on this front and is probably the right call. The implementation is utterly horrid. For one, the non-blocking aspect starts with the obvious equivalent of uma_zalloc(..., M_NOWAIT) and bailing if it fails, except if you read past that to actual registration it can perform an alloc which can block indefinitely while holding on to some vnodes: // if we haven't gotten the path yet, get it. if (pathbuff == NULL) { pathbuff = get_pathbuff(); pathbuff_len = MAXPATHLEN; where get_pathbuf is: return zalloc(ZV_NAMEI); So the notification routine can block indefinitely in a low-memory condition. I tried to figure out if this is ever called without other vnodes write-locked (as opposed to "just" refed), but their code is such a mess that my level of curiosity was dwarfed by difficulty of getting a definitive answer. Other than that it is terribly inefficient and artificially limited to 8 processes which can do anything. That is to say it is unfit for anything but laptop-scale usage. Perhaps you meant it does not block if the watchers decide to not process any events, but that's almost inherently true if one allows for lossy notifications. > I think there’s a nice design > possible with a bloom filter in the kernel of events that ensures that > monitors may get spurious events but don’t miss out on anything. > [snip] > I think the right kernel API would walk the directory and add the vnodes to > a bloom filter and trigger a notification on a match in the filter. You’d > then have occasional spurious notifications but you’d have something that > could be monitored via kqueue and could be made to not block anything else > in the kernel. > I don't see how this can work. A directory can have more inodes than you can have vnodes at any point. So if you add vnodes to a list as you go, they may fall off of so that you can continue adding other entries. But perhaps you mean you could store the inode number as opposed to holding to the vnode? Even then, the number of entries to scan to make it happen is so big that it is going to be impractical on anything but laptop-scale. What can be fast is checking if the parent dir wants notifications, but this ignores changes to hardlinks. Except *currently* the VFS layer does not reliably track who the parent is (and in fact it can fail to spot one). The VFS layer contains a lot of cruft and design decisions which at least today are questionable at best, but fixable. A big chunk of this concerns name caching, which currently is entirely optional. Should someone want to propose an API for file notification changes, they need to state something which if implemented does not result in unfixable drag on the layer, even if initial implementation would be suboptimal. Handling arbitrary hardlinks looks like a drag to me, but I'm happy to review an implementation which avoids being a problem. That is to say, a laptop-scale API can probably be implemented as is, but solution which can provide reliable events (not to be confused with reliably notifying about all events) would require numerous changes. -- Mateusz Guzik <mjguzik gmail.com>