Hi Aaron, We already discussed on irc, but just for the record.
On Mon, Feb 19, 2024 at 11:20:13PM -0500, Aaron Merey wrote: > On Tue, Feb 13, 2024 at 8:28 AM Mark Wielaard <m...@klomp.org> wrote: > > > > > This patch's method of building the aranges list is slower than simply > > > reading .debug_aranges. On my machine, running eu-stack on a 2.9G > > > firefox core file takes about 8.7 seconds with this patch applied, > > > compared to about 3.3 seconds without this patch. > > > > That is significant. 2.5 times slower. > > Did you check with perf or some other profiler where exactly the extra > > time goes. Does the new method find more aranges (and so produces > > "better" backtraces)? > > I took another look at the performance and realized I made a silly > mistake when I originally tested this. My build that was 2.5x slower > was compiled with -O0 but I tested it against an -O2 build. Oops! > > With the optimization level set to -O2 in all cases, the runtime of > 'eu-stack -s' on the original 2.9G firefox core file is only about > 9% slower: 3.6 seconds with the patch applied compared to 3.3 > seconds without the patch. OK, still a slowdown, but 9% is much more reasonable given we are doing more work now. Good. > As for the number of aranges found, there is a difference for libxul.so: > 250435 with the patch compared to 254832 without. So 4397 fewer aranges > are found when using the new CU iteration method. I'll dig into this and > see if there is a problem or if it's just due to some redundancy in > libxul's .debug_aranges. FWIW there was no change to the aranges counts > for the other modules searched during this eu-stack firefox corefile test. A quick way to see where the differences are is using eu-readelf --debug-dump=decodedaranges before/after your patch. This is opposite to what I expected. I had expected there to be more, instead of less ranges. The difference is less than 2%. But still interesting to know what/why. Were there any differences in the backtraces? If not, then those ranges might not actually have been mapping to code. > > Might it be an idea to leave dwarf_getaranges as it is and introduce a > > new (internal) function to get "dynamic" ranges? It looks like what > > programs (like eu-stack and eu-addr2line) really use is dwarf_addrdie > > and dwfl_module_addrdie. These are currently build on dwarf_getaranges, > > but could maybe use a new interface? > > IMO this depends on what users expect from dwarf_getaranges. Do they > want the exact contents of .debug_aranges (whether or not it's complete) > or should dwarf_getaranges go beyond .debug_aranges to ensure the most > complete results? > > The comment for dwarf_getaranges in libdw.h simply reads "Return list > address ranges". Since there's no mention of .debug_aranges specifically, > I think it's fair if dwarf_getaranges does whatever it can to ensure > comprehensive results. In which case dwarf_getaranges should probably > dynamically generate aranges. You might be right that no user really cares. But as seen in the eu-readelf code, it might also be that people expected it to map to the ranges from .debug_aranges. So I would be happier if we just kept the dwarf_getaranges code as is. And just change the code in dwarf_addrdie and dwfl_module_addrdie. We could then also introduce a new public function, dwarf_getdieranges (?) that does the new thing. But it doesn't have to be public on the first try as long as dwarf_addrdie and dwfl_module_addrdie work. (We might want to change the interface of dwarf_getdieranges so it can be "lazy" for example.) Cheers, Mark