I've done a lot of profiling of Rust code using tools that are based around stack traces. I can't emphasize how useful it is to have inlined stack frames.
Here is a highly typical example from a profile I had lying around: #1: 0x655A213: alloc (alloc.rs:75) #2: 0x655A213: alloc (alloc.rs:151) #3: 0x655A213: reserve_internal<(syntax::tokenstream::TokenTree, syntax::tokenstream::IsJoint),alloc::alloc::Global> (raw_vec.rs:668) #4: 0x655A213: reserve<(syntax::tokenstream::TokenTree, syntax::tokenstream::IsJoint),alloc::alloc::Global> (raw_vec.rs:491) #5: 0x655A213: reserve<(syntax::tokenstream::TokenTree, syntax::tokenstream::IsJoint)> (vec.rs:457) #6: 0x655A213: push<(syntax::tokenstream::TokenTree, syntax::tokenstream::IsJoint)> (vec.rs:1023) #7: 0x655A213: parse_token_trees_until_close_delim (tokentrees.rs:27) #8: 0x655A213: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader>::parse_token_tree (tokentrees.rs:81) #9: 0x655A17F: parse_token_trees_until_close_delim (tokentrees.rs:26) #10: 0x655A17F: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader>::parse_token_tree (tokentrees.rs:81) #11: 0x655A17F: parse_token_trees_until_close_delim (tokentrees.rs:26) #12: 0x655A17F: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader>::parse_token_tree (tokentrees.rs:81) That stack trace shows inlined frames. Notice that there are only three distinct addresses. Without the inlined frames, there would only be three frames, something like this: #1: 0x655A213: alloc (alloc.rs:75) #2: 0x655A213: reserve_internal<(syntax::tokenstream::TokenTree, syntax::tokenstream::IsJoint),alloc::alloc::Global> (raw_vec.rs:668) #3: 0x655A17F: parse_token_trees_until_close_delim (tokentrees.rs:26) Here is a more extreme (but still not *so* unusual) example: #1: 0x5794562: alloc (alloc.rs:75) #2: 0x5794562: exchange_malloc (alloc.rs:185) #3: 0x5794562: clone<rustc::mir::Rvalue> (boxed.rs:332) #4: 0x5794562: clone (mod.rs:1759) #5: 0x5794562: <rustc::mir::Statement as core::clone::Clone>::clone ( mod.rs:1733) #6: 0x579E30A: {{closure}}<core::slice::Iter<rustc::mir::Statement>,rustc::mir::Statement,(),closure> (mod.rs:272) #7: 0x579E30A: fold<rustc::mir::Statement,(),closure> (mod.rs:3150) #8: 0x579E30A: fold<core::slice::Iter<rustc::mir::Statement>,rustc::mir::Statement,(),closure> (mod.rs:272) #9: 0x579E30A: for_each<core::iter::adapters::Cloned<core::slice::Iter<rustc::mir::Statement>>,closure> (iterator.rs:604) #10: 0x579E30A: spec_extend<rustc::mir::Statement,core::iter::adapters::Cloned<core::slice::Iter<rustc::mir::Statement>>> (vec.rs:1852) #11: 0x579E30A: spec_extend<rustc::mir::Statement,core::slice::Iter<rustc::mir::Statement>> (vec.rs:1902) #12: 0x579E30A: extend_from_slice<rustc::mir::Statement> (vec.rs:1351) #13: 0x579E30A: to_vec<rustc::mir::Statement> (slice.rs:160) #14: 0x579E30A: to_vec<rustc::mir::Statement> (slice.rs:380) #15: 0x579E30A: clone<rustc::mir::Statement> (vec.rs:1642) #16: 0x579E30A: <rustc::mir::BasicBlockData as core::clone::Clone>::clone ( mod.rs:1022) #17: 0x564271A: {{closure}}<core::slice::Iter<rustc::mir::BasicBlockData>,rustc::mir::BasicBlockData,(),closure> (mod.rs:272) #18: 0x564271A: fold<rustc::mir::BasicBlockData,(),closure> (mod.rs:3150) #19: 0x564271A: fold<core::slice::Iter<rustc::mir::BasicBlockData>,rustc::mir::BasicBlockData,(),closure> (mod.rs:272) #20: 0x564271A: for_each<core::iter::adapters::Cloned<core::slice::Iter<rustc::mir::BasicBlockData>>,closure> (iterator.rs:604) #21: 0x564271A: spec_extend<rustc::mir::BasicBlockData,core::iter::adapters::Cloned<core::slice::Iter<rustc::mir::BasicBlockData>>> (vec.rs:1852) #22: 0x564271A: spec_extend<rustc::mir::BasicBlockData,core::slice::Iter<rustc::mir::BasicBlockData>> (vec.rs:1902) #23: 0x564271A: extend_from_slice<rustc::mir::BasicBlockData> (vec.rs:1351) #24: 0x564271A: to_vec<rustc::mir::BasicBlockData> (slice.rs:160) #25: 0x564271A: to_vec<rustc::mir::BasicBlockData> (slice.rs:380) #26: 0x564271A: clone<rustc::mir::BasicBlockData> (vec.rs:1642) #27: 0x564271A: clone<rustc::mir::BasicBlock,rustc::mir::BasicBlockData> ( indexed_vec.rs:499) #28: 0x564271A: <rustc::mir::Mir as core::clone::Clone>::clone (mod.rs:90) There are three "real" stack frames and 25 inlined stack frames. Those inline stack frames are the difference between "I have a vague idea of what is going on" and "I know exactly what is going on". Any changes that take us that direction are very useful :) Nick On Sat, 6 Apr 2019 at 03:34, Nathan Froyd <nfr...@mozilla.com> wrote: > TL;DR: We're making some changes to how inlined functions are handled > in our crash reports on non-Windows platforms in bug 524410. This > change should mostly result in more understandable crash stacks for > code that uses lots of inlining, and shouldn't make things any worse. > Some crash signatures may change as a result. If you have concerns, > or you happen to have crash stacks that you're curious about whether > they'd change under this new policy, please comment in bug 524410 or > email me. > > For the grotty details, read on. > > Modern C++/Rust code relies on inlining for efficiency, and modern > compilers have gotten very good at accommodating such code: it's not > unusual for code to feature double-digit levels of inlining (A inlined > into B inlined into C...inlined into J). A simple Rust function that > looks like: > > slice.into_iter().map(|...| { ... }) > > and you think of as spanning addresses BEGIN to END, might actually > feature small ranges of instructions from a dozen different functions, > and the compiler will (mostly) faithfully tell you a precise location > for each range. (Instruction A comes from some iterator code, > instruction B comes from a bit of your closure, instruction C comes > from some inlined function three levels deep inside of your > closure...) Unfortunately, this faithfulness means that in the event > of a crash, the crashing instruction might get attributed to some code > in libstd with no indication of how that relates to the original code. > > Bug 524410, supporting inline functions in the symbol dumper, has been > open for a decade now. The idea is that compilers (on Unix platforms, > not entirely sure this is true on Windows) will not only give you > precise information about what function particular instruction ranges > come from, they will also give you information about the chain of > inlining that resulted in those particular instructions. The symbol > dumper ought to be able to emit enough information to reconstruct > "frames" from inlined functions. That is, if you have: > > ``` > addr0 ---+ > ... | A > addr1 ---+ > addr2 ---+ ---+ ------+ > ... | B | | > addr3 ---+ | operator+ | > addr4 ---+ | | > ... | C | | > addr5 ---+ ---+ | DoTheThing > addr6 ---+ ---+ | > ... | D | | > addr7 ---+ | operator[] | > addr8 ---+ | | > ... | E | | > addr9 ---+ ---+ ------+ > ... > ``` > > (apologies if the ASCII art doesn't come through), and you're crashing > at `addr6`, the status quo is that you know you crashed in `D`, and > the next frame is whoever called `DoTheThing`. The ideal state of > affairs is that you're told that you crashed in `D`, called from > `operator[]`, called from `DoTheThing`, and so forth. > > We're not there yet. Changing the symbol dumper to emit this > information requires changing the symbol file format, which requires > some coordinated updates to several pieces of infrastructure (and > probably others that I don't know about). It also requires discussion > with upstream Breakpad (and therefore breaking many more consumers > than just Mozilla-internal ones) and/or forking Breakpad completely > and/or rewriting all our tools (which is a special case of forking). > We want to get to that end state, but it's a fair bit of work. > > The patches on the bug implement a truncated version of the above that > doesn't require dumping the entire inlining hierarchy. The idea is > that, as much as possible, for addresses in some function, you want to > attribute those addresses to source code lines for said function. So > instead of recording the most precise lines possible (A, B, C, D, E, > etc. in the above), you want to simply attribute the entire range of > B-E to `DoTheThing`. This transformation loses information, but it > tends to produce stack traces that make more sense to humans. You can > see some examples of the changes in: > > https://bugzilla.mozilla.org/show_bug.cgi?id=524410#c22 > https://bugzilla.mozilla.org/show_bug.cgi?id=524410#c29 > > which result in more sensible crash stacks, even if they don't > immediately point out what's going wrong. > > This work has not landed yet, but should land sometime next week. If > you have concerns, or you happen to have crash stacks that you're > curious about whether they'd change under this new policy, please > comment in bug 524410 or email me. > > Thanks, > -Nathan > _______________________________________________ > Stability mailing list > stabil...@mozilla.org > https://mail.mozilla.org/listinfo/stability > _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform