Re: [lldb-dev] LLDB performance drop from 3.9 to 4.0

Pavel Labath via lldb-dev Thu, 20 Apr 2017 03:17:19 -0700

Thank you very much for tracking this down.

+1 for making UniqueCStringMap speak ConstString -- i think it just makes
sense given that it already has "unique" in the name.


ConstString already has a GetStringRef accessor. Also adding a conversion
operator may be a good idea, although it probably won't help in all
situations (you'll still have to write StringRef(X).drop_front() etc. if
you want to do stringref operations on the string)

pl

On 20 April 2017 at 01:46, Zachary Turner <[email protected]> wrote:

> It doesn't entirely defeat the purpose, it's just not *as good* as making
> the interfaces take ConstStrings.  StringRef already has a lot of safety
> and usability improvements over raw char pointers, and those improvements
> don't disappear just because you aren't using ConstString.  Although I
> agree that if you can make it work where the interface only accepts and
> returns ConstStrings, and make conversion from ConstString to StringRef
> more seamless, that would be an even better improvement.
>
> On Wed, Apr 19, 2017 at 5:33 PM Scott Smith <[email protected]>
> wrote:
>
>> If I just assume the pointers are from ConstString, then doesn't that
>> defeat the purpose of making the interface safer?  Why not use an actual
>> ConstString and provide conversion operators from ConstString to
>> StringRef?  Seems we should be able to rely on the type system to get us
>> safety and performance.
>>
>> I'll try putting something together tomorrow.
>>
>> On Wed, Apr 19, 2017 at 4:48 PM, Zachary Turner <[email protected]>
>> wrote:
>>
>>> The change was made to make the interface safer and allow propagation of
>>> StringRef through other layers.  The previous code was already taking a
>>> const char *, and so it was working under the assumption that the const
>>> char* passed in came from a ConstString.  As such, continuing to make that
>>> same assumption seems completely reasonable.
>>>
>>> So perhaps you can just change the operator to compare the pointers, as
>>> was being done before.
>>>
>>> On Wed, Apr 19, 2017 at 4:24 PM Scott Smith <[email protected]>
>>> wrote:
>>>
>>>> It looks like it was this change:
>>>>
>>>> commit 45fb8d00309586c3f7027f66f9f8a0b56bf1cc4a
>>>> Author: Zachary Turner <[email protected]>
>>>> Date:   Thu Oct 6 21:22:44 2016 +0000
>>>>
>>>>     Convert UniqueCStringMap to use StringRef.
>>>>
>>>>     git-svn-id: https://llvm.org/svn/llvm-project/lldb/trunk@283494
>>>> 91177308-0d34-0410-b5e6-96231b3b80d8
>>>>
>>>>
>>>> I'm guessing it's because the old code assumed const string, which
>>>> meant that uniqueness comparisons could be done by simply comparing the
>>>> pointer.  Now it needs to use an actual string comparison routine.  This
>>>> code:
>>>>
>>>>      bool operator<(const Entry &rhs) const { return cstring <
>>>> rhs.cstring; }
>>>>
>>>> didn't actually change in the revision, but cstring went from 'const
>>>> char *' to 'StringRef'.  If you know for sure that all the StringRefs come
>>>> from ConstString, then it'd be easy enough to change the comparison, but I
>>>> don't know how you guarantee that.
>>>>
>>>> I assume the change was made to allow proper memory cleanup when the
>>>> symbols are discarded?
>>>>
>>>> On Thu, Apr 13, 2017 at 5:37 AM, Pavel Labath <[email protected]>
>>>> wrote:
>>>>
>>>>> Bisecting the performance regression would be extremely valuable. If
>>>>> you want to do that, it would be very appreciated.
>>>>>
>>>>> On 12 April 2017 at 20:39, Scott Smith via lldb-dev <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> For my app I think it's largely parsing debug symbols tables for
>>>>>> shared libraries.  My main performance improvement was to increase the
>>>>>> parallelism of parsing that information.
>>>>>>
>>>>>> Funny, gdb/gold has a similar accelerator table (created when you
>>>>>> link with -gdb-index).  I assume lldb doesn't know how to parse it.
>>>>>>
>>>>>> I'll work on bisecting the change.
>>>>>>
>>>>>> On Wed, Apr 12, 2017 at 12:26 PM, Jason Molenda <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I don't know exactly when the 3.9 / 4.0 branches were cut, and what
>>>>>>> was done between those two points, but in general we don't expect/want 
>>>>>>> to
>>>>>>> see performance regressions like that.  I'm more familiar with the perf
>>>>>>> characteristics on macos, Linux is different in some important regards, 
>>>>>>> so
>>>>>>> I can only speak in general terms here.
>>>>>>>
>>>>>>> In your example, you're measuring three things, assuming you have
>>>>>>> debug information for MY_PROGRAM.  The first is "Do the initial read of 
>>>>>>> the
>>>>>>> main binary and its debug information".  The second is "Find all symbol
>>>>>>> names 'main'".  The third is "Scan a newly loaded solib's symbols"
>>>>>>> (assuming you don't have debug information from solibs from /usr/lib 
>>>>>>> etc).
>>>>>>> Technically there's some additional stuff here -- launching the process,
>>>>>>> detecting solibs as they're loaded, looking up the symbol context when 
>>>>>>> we
>>>>>>> hit the breakpoint, backtracing a frame or two, etc, but that stuff is
>>>>>>> rarely where you'll see perf issues on a local debug session.
>>>>>>>
>>>>>>> Which of these is likely to be important will depend on your
>>>>>>> MY_PROGRAM.  If you have a 'int main(){}', it's not going to be dwarf
>>>>>>> parsing.  If your binary only pulls in three solib's by the time it is
>>>>>>> running, it's not going to be new module scanning. A popular place to 
>>>>>>> spend
>>>>>>> startup time is in C++ name demangling if you have a lot of solibs with 
>>>>>>> C++
>>>>>>> symbols.
>>>>>>>
>>>>>>>
>>>>>>> On Darwin systems, we have a nonstandard accelerator table in our
>>>>>>> DWARF emitted by clang that lldb reads.  The "apple_types", 
>>>>>>> "apple_names"
>>>>>>> etc tables.  So when we need to find a symbol named "main", for Modules
>>>>>>> that have a SymbolFile, we can look in the accelerator table.  If that
>>>>>>> SymbolFile has a 'main', the accelerator table gives us a reference into
>>>>>>> the DWARF for the definition, and we can consume the DWARF lazily.  We
>>>>>>> should never need to do a full scan over the DWARF, that's considered a
>>>>>>> failure.
>>>>>>>
>>>>>>> (in fact, I'm working on a branch of the llvm.org sources from
>>>>>>> mid-October and I suspect Darwin lldb is often consuming a LOT more 
>>>>>>> dwarf
>>>>>>> than it should be when I'm debugging, I need to figure out what is 
>>>>>>> causing
>>>>>>> that, it's a big problem.)
>>>>>>>
>>>>>>>
>>>>>>> In general, I've been wanting to add a new "perf counters"
>>>>>>> infrastructure & testsuite to lldb, but haven't had time.  One thing I 
>>>>>>> work
>>>>>>> on a lot is debugging over a bluetooth connection; it turns out that BT 
>>>>>>> is
>>>>>>> very slow, and any extra packets we send between lldb and debugserver 
>>>>>>> are
>>>>>>> very costly.  The communication is so fast over a local host, or over a 
>>>>>>> usb
>>>>>>> cable, that it's easy for regressions to sneak in without anyone 
>>>>>>> noticing.
>>>>>>> So the original idea was hey, we can have something that counts packets 
>>>>>>> for
>>>>>>> distinct operations.  Like, this "next" command should take no more 
>>>>>>> than 40
>>>>>>> packets, that kind of thing.  And it could be expanded -- "b main should
>>>>>>> fully parse the DWARF for only 1 symbol", or "p *this should only look 
>>>>>>> up 5
>>>>>>> types", etc.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> > On Apr 12, 2017, at 11:26 AM, Scott Smith via lldb-dev <
>>>>>>> [email protected]> wrote:
>>>>>>> >
>>>>>>> > I worked on some performance improvements for lldb 3.9, and was
>>>>>>> about to forward port them so I can submit them for inclusion, but I
>>>>>>> realized there has been a major performance drop from 3.9 to 4.0.  I am
>>>>>>> using the official builds on an Ubuntu 16.04 machine with 16 cores / 32
>>>>>>> hyperthreads.
>>>>>>> >
>>>>>>> > Running: time lldb-4.0 -b -o 'b main' -o 'run' MY_PROGRAM >
>>>>>>> /dev/null
>>>>>>> >
>>>>>>> > With 3.9, I get:
>>>>>>> > real    0m31.782s
>>>>>>> > user    0m50.024s
>>>>>>> > sys    0m4.348s
>>>>>>> >
>>>>>>> > With 4.0, I get:
>>>>>>> > real    0m51.652s
>>>>>>> > user    1m19.780s
>>>>>>> > sys    0m10.388s
>>>>>>> >
>>>>>>> > (with my changes + 3.9, I got real down to 4.8 seconds!  But I'm
>>>>>>> not convinced you'll like all the changes.)
>>>>>>> >
>>>>>>> > Is this expected?  I get roughly the same results when compiling
>>>>>>> llvm+lldb from source.
>>>>>>> >
>>>>>>> > I guess I can spend some time trying to bisect what happened.  5.0
>>>>>>> looks to be another 8% slower.
>>>>>>> >
>>>>>>> > _______________________________________________
>>>>>>> > lldb-dev mailing list
>>>>>>> > [email protected]
>>>>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> lldb-dev mailing list
>>>>>> [email protected]
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>>>>
>>>>>>
>>>>>
>>>>
>>

_______________________________________________
lldb-dev mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] LLDB performance drop from 3.9 to 4.0

Reply via email to