On Tue, 30 Jan 2024 at 21:57, Linus Torvalds <torva...@linux-foundation.org> wrote: > > Ugh.
Oh, and double-ugh on that tracefs_syscall_mkdir/rmdir(). I hate how it does that "unlock and re-lock inode" thing. It's a disease, and no, it's not an acceptable response to "lockdep shows there's a locking problem". The comment says "It is up to the individual mkdir routine to handle races" but that's *completely* bogus and shows a fundamental misunderstanding of locking. Dropping the lock in the middle of a locked section does NOT affect just the section that you dropped the lock around. It affects the *caller*, who took the lock in the first place, and who may well assume that the lock protects things for the whole duration of the lock. And so dropping the lock in the middle of an operation is a bad BAD *BAD* thing to do. Honestly, I had only been looking at the eventfs_inode.c lookup code. But now that I started looking at mkdir/rmdir, I'm seeing the same signs of horrible mistakes there too. And yes, it might be a real problem. For example, for the rmdir case, the actual lock was taken by 'vfs_rmdir()', and it does *not* protect only the ->rmdir() call itself. It also, for example, is supposed to make the ->rmdir be atomic wrt things like dentry->d_inode->i_flags |= S_DEAD; and dont_mount(dentry); detach_mounts(dentry); but now because the inode lock was dropped in the middle, for all we know a "mkdir()" could come in on the same name, and make a mockery of the whole inode locking. The inode lock on that directory that is getting removed is also what is supposed to make it impossible to add new files to the directory while the rmdir is going on. Again, dropping the lock violates those kinds of guarantees. "git blame" actually fingers Al for that "unlock and relock", but that's because Al did a search-and-replace on "mutex_[un]lock(&inode->i_mutex)" and replaced it with "inode_[un]lock(inode)" back in 2016. The real culprit is you, and that sh*t-show goes back to commit 277ba04461c2 ("tracing: Add interface to allow multiple trace buffers"). Christ. Now I guess I need to start looking if there is anything else horrid lurking in that mkdir/rmdir path. I doubt the inode locking problem ends up mattering in this situation. Mostly since this is only tracefs, and normal users shouldn't be able to access these things anyway. And I think (hope?) that you only accept mkdir/rmdir in specific places. But this really is another case of "This code smells *fundamentally* wrong". Adding Al, in the hopes that he will take a look at this too. Linus