Bisecting the performance regression would be extremely valuable. If you want to do that, it would be very appreciated.
On 12 April 2017 at 20:39, Scott Smith via lldb-dev <lldb-dev@lists.llvm.org > wrote: > For my app I think it's largely parsing debug symbols tables for shared > libraries. My main performance improvement was to increase the parallelism > of parsing that information. > > Funny, gdb/gold has a similar accelerator table (created when you link > with -gdb-index). I assume lldb doesn't know how to parse it. > > I'll work on bisecting the change. > > On Wed, Apr 12, 2017 at 12:26 PM, Jason Molenda <ja...@molenda.com> wrote: > >> I don't know exactly when the 3.9 / 4.0 branches were cut, and what was >> done between those two points, but in general we don't expect/want to see >> performance regressions like that. I'm more familiar with the perf >> characteristics on macos, Linux is different in some important regards, so >> I can only speak in general terms here. >> >> In your example, you're measuring three things, assuming you have debug >> information for MY_PROGRAM. The first is "Do the initial read of the main >> binary and its debug information". The second is "Find all symbol names >> 'main'". The third is "Scan a newly loaded solib's symbols" (assuming you >> don't have debug information from solibs from /usr/lib etc). Technically >> there's some additional stuff here -- launching the process, detecting >> solibs as they're loaded, looking up the symbol context when we hit the >> breakpoint, backtracing a frame or two, etc, but that stuff is rarely where >> you'll see perf issues on a local debug session. >> >> Which of these is likely to be important will depend on your MY_PROGRAM. >> If you have a 'int main(){}', it's not going to be dwarf parsing. If your >> binary only pulls in three solib's by the time it is running, it's not >> going to be new module scanning. A popular place to spend startup time is >> in C++ name demangling if you have a lot of solibs with C++ symbols. >> >> >> On Darwin systems, we have a nonstandard accelerator table in our DWARF >> emitted by clang that lldb reads. The "apple_types", "apple_names" etc >> tables. So when we need to find a symbol named "main", for Modules that >> have a SymbolFile, we can look in the accelerator table. If that >> SymbolFile has a 'main', the accelerator table gives us a reference into >> the DWARF for the definition, and we can consume the DWARF lazily. We >> should never need to do a full scan over the DWARF, that's considered a >> failure. >> >> (in fact, I'm working on a branch of the llvm.org sources from >> mid-October and I suspect Darwin lldb is often consuming a LOT more dwarf >> than it should be when I'm debugging, I need to figure out what is causing >> that, it's a big problem.) >> >> >> In general, I've been wanting to add a new "perf counters" infrastructure >> & testsuite to lldb, but haven't had time. One thing I work on a lot is >> debugging over a bluetooth connection; it turns out that BT is very slow, >> and any extra packets we send between lldb and debugserver are very >> costly. The communication is so fast over a local host, or over a usb >> cable, that it's easy for regressions to sneak in without anyone noticing. >> So the original idea was hey, we can have something that counts packets for >> distinct operations. Like, this "next" command should take no more than 40 >> packets, that kind of thing. And it could be expanded -- "b main should >> fully parse the DWARF for only 1 symbol", or "p *this should only look up 5 >> types", etc. >> >> >> >> >> > On Apr 12, 2017, at 11:26 AM, Scott Smith via lldb-dev < >> lldb-dev@lists.llvm.org> wrote: >> > >> > I worked on some performance improvements for lldb 3.9, and was about >> to forward port them so I can submit them for inclusion, but I realized >> there has been a major performance drop from 3.9 to 4.0. I am using the >> official builds on an Ubuntu 16.04 machine with 16 cores / 32 hyperthreads. >> > >> > Running: time lldb-4.0 -b -o 'b main' -o 'run' MY_PROGRAM > /dev/null >> > >> > With 3.9, I get: >> > real 0m31.782s >> > user 0m50.024s >> > sys 0m4.348s >> > >> > With 4.0, I get: >> > real 0m51.652s >> > user 1m19.780s >> > sys 0m10.388s >> > >> > (with my changes + 3.9, I got real down to 4.8 seconds! But I'm not >> convinced you'll like all the changes.) >> > >> > Is this expected? I get roughly the same results when compiling >> llvm+lldb from source. >> > >> > I guess I can spend some time trying to bisect what happened. 5.0 >> looks to be another 8% slower. >> > >> > _______________________________________________ >> > lldb-dev mailing list >> > lldb-dev@lists.llvm.org >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev >> >> > > _______________________________________________ > lldb-dev mailing list > lldb-dev@lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev > >
_______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev