On Sat, 30 Aug 2025 12:03:53 -0700 Linus Torvalds <torva...@linux-foundation.org> wrote:
> On Sat, 30 Aug 2025 at 11:31, Steven Rostedt <rost...@goodmis.org> wrote: > > > > If we are going to rely on mmap, then we might as well get rid of the > > vma_lookup() altogether. The mmap event will have the mapping of the > > file to the actual virtual address. > > It actually won't - not unless you also track every mremap etc. > > Which is certainly doable, but I'd argue that it's a lot of complexity. > > All you really want is an ID for the file mapping, and yes, I agree > that it's very very annoying that we don't have anything that can then > be correlated to user space any other way than also having a stage > that tracks mmap. > > I've slept on it and tried to come up with something, and I can't. As > mentioned, the inode->i_ino isn't actually exposed to user space as > such at all for some common filesystems, so while it's very > traditional, it really doesn't actually work. It's also almost > impossible to turn into a path, which is what you often would want for > many cases. > > That said, having slept on it, I'm starting to come around to the > inode number model, not because I think it's a good model - it really > isn't - but because it's a very historical mistake. > > And in particular, it's the same mistake we made in /proc/<xyz>/maps. > > So I think it's very very wrong, but it does have the advantage that > it's a number that we already do export. > > But the inode we expose that way isn't actually the > 'vma->vm_file->f_inode' as you'd think, it's actually > > inode = file_user_inode(vma->vm_file); > > which is subtly different for the backing inode case (ie overlayfs). > > Oh, how I dislike that thing, but using the same thing as > /proc/<xyz>/maps does avoid some problems. > Sorry for the late reply. I left to the Tracing Summit the following Monday, and when I got back home on Thursday, I came down with a nasty cold that prevented me from thinking about any of this. I just re-read the entire thread, and I'm still not sure where to go with this. Thus, let me start with what I'm trying to accomplish, and even add one example of a real world use case we would like to have. Several times we find issues with futexes causing applications to either lock up or cause long latency. Since a futex is mostly managed in user space, it's good to be able to at least have a backtrace of where a contended futex occurs. Thus we start tracing the futex system call and triggering a user space backtrace on each one. Using this information can help us figure out where the futex contention lies. This is just one use case, we do have others. Ideally, the user space stack trace should look like: futex_requeue-1044 [002] ..... 168.761423: <user stack unwind> cookie=31500000003 => <000000000009a9ee> : path=/usr/lib/x86_64-linux-gnu/libselinux.so.1 build_id={0x3ba6e0c2,0xdd815e8,0xe1821a58,0xa5940cef,0x7c7bc5ab} => <0000000000001472> : path=/work/c/futex_requeue build_id={0xc02417ea,0x1f4e0143,0x338cf27d,0x506a7a5d,0x7884d090} => <0000000000092b7b> : path=/usr/lib/x86_64-linux-gnu/libselinux.so.1 build_id={0x3ba6e0c2,0xdd815e8,0xe1821a58,0xa5940cef,0x7c7bc5ab} Where the above shows the callstack (offset from the file), the path to the file, and a build id of that file such that the tooling can verify that the path is indeed the same library/executable as for when the trace occurred. Note, the build-id isn't really necessary for my own use case, because the applications seldom change on a chromebook. I added it as it appears to be useful for others I've talked to that would like to use this. But printing a copy of the full path name and build-id at every stack trace is expensive. The path lookup may not be so bad, but the space on the ring buffer is. To compensate this, we could replace the path and build-id with a unique identifier, (being an inode/device or hash, or whatever) to associate that file. It may even work if it is unique per task. Then whenever one of these identifiers were to show up representing a new file, it would be printed. We could monitor an event that if a file is deleted, renamed, or whatever, and a new file with the same name comes around, the identifier with the path and build-id gets printed for the new file. Where the output would be, instead: sed-1037 [007] ...1. 167.362583: file_map: hash=0x51eff94b path=/usr/lib/x86_64-linux-gnu/libselinux.so.1 build_id={0x3ba6e0c2,0xdd815e8,0xe1821a58,0xa5940cef,0x7c7bc5ab} [..] futex_requeue-1042 [007] ...1. 168.754128: file_map: hash=0xad2c6f1b path=/work/c/futex_requeue build_id={0xc02417ea,0x1f4e0143,0x338cf27d,0x506a7a5d,0x7884d090} [..] futex_requeue-1042 [007] ..... 168.757912: <user stack unwind> cookie=34900000008 => <00000000001001ca> : 0x51eff94b => <000000000000173c> : 0xad2c6f1b => <0000000000029ca8> : 0x51eff94b [.. repeats several more traces without having to save the path names again ..] It comes down to when do we print these mappings? I noticed that uprobes has hooks to all the mmappings in the vma code as it needs to keep track of them. We could change those hooks to tracepoints, and have both uprobes and tracing monitor the changes, and when a new mapping happens, it traces it. Changing them to tracepoints may be useful anyway, as it would then turn them over to static branchs and not a normal "if" statement. We could even add a file to tracefs that would trigger the dump of all files that are mapped executable for all currently running tasks.Then when tracing starts, it would trigger the "show all currently running task mappings" and then only do the mappings on demand. This way, the tracer would get the mappings of the identifier (or hash, or whatever) to the files and build-ids at the start of tracing, as well as get any of the mappings when they happen later on. This should have enough information for the post processing to put the stack traces back to what is ideal in the first place. That is, the tooling could output: futex_requeue-1044 [002] ..... 168.761423: <user stack unwind> cookie=31500000003 => <000000000009a9ee> : path=/usr/lib/x86_64-linux-gnu/libselinux.so.1 build_id={0x3ba6e0c2,0xdd815e8,0xe1821a58,0xa5940cef,0x7c7bc5ab} => <0000000000001472> : path=/work/c/futex_requeue build_id={0xc02417ea,0x1f4e0143,0x338cf27d,0x506a7a5d,0x7884d090} => <0000000000092b7b> : path=/usr/lib/x86_64-linux-gnu/libselinux.so.1 build_id={0x3ba6e0c2,0xdd815e8,0xe1821a58,0xa5940cef,0x7c7bc5ab} and hide the identifier that was used in the ring buffer. -- Steve