On Wed, 13 Aug 2025, Al Viro wrote: > On Tue, Aug 12, 2025 at 12:25:05PM +1000, NeilBrown wrote: > > This patch is the first step in introducing a new API for locked > > operation on names in directories. It supports operations that create or > > remove names. Rename operations will also be part of this new API but > > require different specific interfaces. > > > > The plan is to lock just the dentry (or dentries), not the whole > > directory. dentry_lookup() combines locking the directory and > > performing a lookup prior to a change to the directory. On success it > > returns a dentry which is consider to be locked, though at this stage > > the whole parent directory is actually locked. > > > > dentry_lookup_noperm() does the same without needing a mnt_idmap and > > without checking permissions. This is useful for internal filesystem > > management (e.g. creating virtual files in response to events) and in > > other cases similar to lookup_noperm(). > > Details, please. I seriously hope that simple_start_creating() will > end up used for all of those; your variant allows passing LOOKUP_... > flags and I would like to understand what the usecases will be.
simple_start_creating() would meet a lot of needs. A corresponding simple_start_deleting() would suit cachefiles_lookup_for_cull(), fuse_reverse_inval_entry(), nfsd4_unlink_clid_dir() etc. btrfs_ioctl_snap_destroy() would want a simple_start_deleting() but also wants killable. cachefiles_get_directory() wants a simple_start_creating() without the LOOKUP_EXCL so that if is already exists, it can go ahead and use the dentry without creating. cachefiles_commit_tmpfile() has a similar need - if it exists it will unlink and repeat the lookup. Once it doesn't it it will be target of vfs_link() nfsd3_create_file() wants simple_start_creating without LOOKUP_EXCL. as do a few other nfsd functions. nfsd4_list_rec_dir() effectively wants simple_start_deleting() (i.e. fail if it doesn't exist) but sometimes it won't delete, it will do something else. All calls pass one of: 0 LOOKUP_CREATE LOOKUP_CREATE | LOOKUP_EXCL The first two aren't reliably called for any particular task so a "simple_start_XXX" sort of name doesn't seem appropriate. > > What's more, IME the "intent" arguments are best avoided - better have > separate primitives; if internally they call a common helper with some > flags, etc., it's their business, but exposing that to callers ends > up with very unpleasant audits down the road. As soon as you get > callers that pass something other than explicit constants, you get > data flow into the mix ("which values can end up passed in this one?") > and that's asking for trouble. lookup_no_create, lookup_may_create, lookup_must_create ??? Either as function names, or as an enum to pass to the function? If we had separate functions we would need _noperm and potentially _killable versions of each. Fortunately there is no current need for _noperm_killable. Maybe that combinatorial explosion isn't too bad. > > > __dentry_lookup() is a VFS-internal interface which does no permissions > > checking and assumes that the hash of the name has already been stored > > in the qstr. This is useful following filename_parentat(). > > > > done_dentry_lookup() is provided which performs the inverse of putting > > the dentry and unlocking. > > Not sure I like the name, TBH... I'm open to suggestions for alternatives. > > > Like lookup_one_qstr_excl(), dentry_lookup() returns -ENOENT if > > LOOKUP_CREATE was NOT given and the name cannot be found,, and returns > > -EEXIST if LOOKUP_EXCL WAS given and the name CAN be found. > > > > These functions replace all uses of lookup_one_qstr_excl() in namei.c > > except for those used for rename. > > > > They also allow simple_start_creating() to be simplified into a > > static-inline. > > Umm... You've also moved it into linux/namei.h; we'd better verify that > it's included in all places that need that... I added includes where necessary. > > > A __free() class is provided to allow done_dentry_lookup() to be called > > transparently on scope exit. dget() is extended to ignore ERR_PTR()s > > so that "return dget(dentry);" is always safe when dentry was provided > > by dentry_lookup() and the variable was declared __free(dentry_lookup). > > Please separate RAII stuff from the rest of that commit. Deciding if > it's worth doing in any given case is hard to do upfront. I'd rather not - it does make a few changes much nicer. But I can if it is necessary. > > > lookup_noperm_common() and lookup_one_common() are moved earlier in > > namei.c. > > Again, separate commit - reduce the noise in less trivial ones. > > > +struct dentry *dentry_lookup(struct mnt_idmap *idmap, > > + struct qstr *last, > > + struct dentry *base, > > + unsigned int lookup_flags) > > Same problem with flags, *ESPECIALLY* if your endgame involves the > locking of result dependent upon those. Locking the result happens precisely if a non-error is returned. The lookup flags indicate which circumstances result in errors. > > > - dput(dentry); > > + done_dentry_lookup(dentry); > > dentry = ERR_PTR(error); > > -unlock: > > - inode_unlock(path->dentry->d_inode); > > Incidentally, this combination (dput()+unlock+return ERR_PTR()) > is common enough. Might be worth a helper (taking error as argument; > that's one of the reasons why I'm not sure RAII is a good fit for this > problem space) I found RAII worked quite well in several places and very well in a few. I think the main reason I had for *not* using RAII is that you really need to use it for everything and I didn't want to change code too much. > > > +/* no_free_ptr() must not be used here - use dget() */ > > +DEFINE_FREE(dentry_lookup, struct dentry *, if (_T) done_dentry_lookup(_T)) > > UGH. Please, take that to the very end of the series. And the comment > re no_free_ptr() and dget() is really insufficient - you will get a dentry > reference that outlives your destructor, except that locking environment will > change. Which is subtle enough to warrant a discussion. > > Besides, I'm not fond of mixing that with dget(); put another way, this > subset of dget() uses is worth being clearly marked as such. A primitive > that calls dget() to do work? Sure, no problem. > > We have too many dget() callers with very little indication of what we are > doing there (besides "bump the refcount"); tree-in-dcache series will > at least peel off the ones that are about pinning a created object in > ramfs-style filesystems. That's not going to cover everything (not even > close), but let's at least not add to the already messy pile... > Thanks for the review, NeilBrown