On Mon, 8 Sept 2025 at 14:42, Steven Rostedt <rost...@goodmis.org> wrote: > > I just re-read the entire thread, and I'm still not sure where to go with > this.
So honestly, I don't know how to get where you want to get - or whether it's even *possible* without horrible performance impact. And no, we're not adding crap interfaces to mmap/munmap just for a stupid sysfs tracing thing. > Ideally, the user space stack trace should look like: > > futex_requeue-1044 [002] ..... 168.761423: <user stack unwind> > cookie=31500000003 > => <000000000009a9ee> : path=/usr/lib/x86_64-linux-gnu/libselinux.so.1 > build_id={0x3ba6e0c2,0xdd815e8,0xe1821a58,0xa5940cef,0x7c7bc5ab} > => <0000000000001472> : path=/work/c/futex_requeue > build_id={0xc02417ea,0x1f4e0143,0x338cf27d,0x506a7a5d,0x7884d090} > => <0000000000092b7b> : path=/usr/lib/x86_64-linux-gnu/libselinux.so.1 > build_id={0x3ba6e0c2,0xdd815e8,0xe1821a58,0xa5940cef,0x7c7bc5ab} Yes. And I think that's what you should aim to generate. Not inode numbers, because inode numbers are the wrong thing. > Note, the build-id isn't really necessary for my own use case, because the > applications seldom change on a chromebook. I added it as it appears to be > useful for others I've talked to that would like to use this. My personal suspicion is that in reality, the pathname is sufficient. It's certainly a lot better than inode numbers are, in that the pathname is meaningful even after-the-fact, and even on a different machine etc. It's not some guaranteed match with some particular library or executable version, no. But for some random one-time quick scripting thing that uses sysfs, it's probably "good enough". The build id is certainly very convenient too, but it's not *always* convenient. And 99% of the time you could just look up the build id from the path, even though obviously that wouldn't work across machines and wouldn't work across system updates. > But printing a copy of the full path name and build-id at every stack trace > is expensive. The path lookup may not be so bad, but the space on the ring > buffer is. So that's the thing. You can do it right, or you can do it wrong. I'd personally tend to prefer the "expensive but right", and just make it a trace-time option. > To compensate this, we could replace the path and build-id with a unique > identifier, (being an inode/device or hash, or whatever) to associate that > file. It may even work if it is unique per task. Then whenever one of these > identifiers were to show up representing a new file, it would be printed. So I really hate the inode number, because it's just wrong. You can't match it across machines, and to make things worse it's not even *meaningful* over time or over machines - or to humans - so it's strictly clearly objectively worse than the pathname. But more importanly - rven on the *local* machine - and at the moment - it's actually wrong. Exactly because the inode number you look up is *not* the user-visible inode number from 'stat()'. So it's *really* wrong to use the inode number. It's basically never right. And bever will be, even if you can make it appear useful in some specific cases. The *one* saving grace for the inode number is that *in*the*moment* you can match it against /proc/<pid>/maps, because that /proc file has that historical bug too (it wasn't buggy at the time that /proc file was introduced, but our filesystems have become much more complex since). So if you do that inode = file_user_inode(vma->vm_file); that I mentioned, at least the otherwise random inode numbers can be matched to *something*. That still doesn't fix the other issues with inode numbers, but it means that at the time of the trace - and on the machine that the tracing is done - you can now match that not-quite-real inode number and device against another /proc file, and turn it into a pathname. But it's kind of sad to do that, when you could just do the pathname in the trace directly, and not force the stupid interface in the first place. And honestly, at that point it's still not really *better* than the pathname (and arguably much much worse, because you might not be able to do the matching if you didn't catch the /proc/<pid>/maps file). So the inode number - together with a lookup in /proc/<pid>/maps - is generally about the same as just giving a path, but typically much less convenient, and anybody using that interface would have to do extra work in user space. And *none* of these issues would be true of somebody who uses the 'perf()' interface that can do all of this much more efficiently, and without the downsides, and without any artificially limited sysfs interfaces. So that really makes me go: just don't expose this at all in sysfs files. You *cannot* do a good job in sysfs, because the interface is strictly worse than just doing the proper job using perf. Alternatively, just do the expensive thing. Expose the actual pathname, and expose the build ID. Yes, it's expensive, but dammit, that's the whole *point* of tracing in sysfs. sysfs was never about being efficient, it was about convenience. So if you trace through sysfs, you either don't get the full information that could be there, or pay the price for the expense of generating the full info. Make the "give me the expensive output" be a dynamic flag, so that you don't do it by default, but if you have some model where you are scripting things with shell-script rather than doing 'perf record', at least you get good output. Hmm? Linus