On Mon, 29 Jan 2024 11:51:52 -0800 Linus Torvalds <torva...@linux-foundation.org> wrote:
> On Mon, 29 Jan 2024 at 11:24, Linus Torvalds > <torva...@linux-foundation.org> wrote: > > > > So the patch was completely broken. Here's the one that should > > actually compile (although still not actually *tested*). > > Note that this fixes the d_instantiate() ordering wrt initializing the inode. > > But as I look up the call chain, I see many more fundamental mistakes. > > Steven - the reason you think that the VFS doesn't have documentation > is that we *do* have tons of documentation, but it's of the kind "Here > is what you should do". > > It is *not* of the kind that says "You messed up and did something > else, and how do you recover from it?". When I first worked on this, I did read all the VFS documentation, and it was difficult to understand what I needed and what I didn't. It is focused on a real file system and not a pseudo ones. Another mistake I made was thinking that debugfs was doing things the "right" way as well. And having a good idea of how debugfs worked, and thinking it was correct, just made reading VFS documentation even more difficult as I couldn't relate what I knew (about debugfs) with what was being explained. I thought it was perfectly fine to use dentry as a handle for the file system. I did until you told me it wasn't. That made a profound change in my understanding of how things are supposed to work. > > So the fundamental bug I now find is that eventfs_root_lookup() gets a > target dentry, and for some unfathomable reason it then does > > ret = simple_lookup(dir, dentry, flags); > > on it. Which is *completely* broken, because what "simple_lookup()" > does is just say "oh, you didn't have a dentry of this kind before, so > clearly a lookup must be a non-existent file". Remember: this is for > 'tmpfs' kinds of filesystems where the dentry cache cotnains *ALL* > files. Sorry, I don't really understand what you mean by "ALL files"? You mean that all files in the pseudo file system has a dentry to it (like debugfs, and the rest of tracefs)? > > For the tracefs kind of filesystem, it's TOTALLY BOGUS. What the > "simple_lookup()" will do is just a plain > > d_add(dentry, NULL); > > and nothing else. And guess what *that* does? It basically > instantiates a negative dentry, telling all other lookups that the > path does not exist. > > So if you have two concurrent lookups, one will do that > simple_lookup(), and the other will then - depending on timing - > either see the negative dentry and return -ENOENT, or - if it comes in > a bit later - see the new inode that then later gets added by the > first lookup with d_instantiate(). > > See? That simple_lookup() is not just unnecessary, but it's also > actively completely WRONG. Because it instantiates a NULL pointer, > other processes that race with the lookup may now end up saying "that > file doesn't exist", even though it should. > > Basically, you can't use *any* of the "simple" filesystem helpers. > Because they are all designed for that "the dentry tree is all there > is" case. Yeah, the above code did come about with me not fully understanding the above. It's not that it wasn't documented, but I admit, when I read the VFS documentation, I had a lot of trouble trying to make sense of things like negative dentries and how they relate. I now have a much better understanding of most of this, thanks to our discussion here, and also knowing that using dentry as the main handle to a file is *not* how to do it. When thinking it was, it made things much more difficult to comprehend. -- Steve