On 2018-01-25 1:58 PM, Greg Clayton wrote:
On Jan 25, 2018, at 10:25 AM, Erik Pilkington <erik.pilking...@gmail.com> wrote:
Hi,
I'm not at all familiar with LLDB, but I've been doing some work on the
demangler in libcxxabi. It's still a work in progress and I haven't yet copied
the changes over to ItaniumDemangle, which AFAIK is what lldb uses. The
demangler in libcxxabi now demangles the symbol you attached in 3.31 seconds,
instead of 223.54 on my machine. I posted a RFC on my work here
(http://lists.llvm.org/pipermail/llvm-dev/2017-June/114448.html), but basically
the new demangler just produces an AST then traverses it to print the demangled
name.
Great to hear the huge speedup in demangling! LLDB actually has two demanglers:
a fast one that can demangle 99% of names, and we fall back to ItaniumDemangle
which can do all names but is really slow. It would be fun to compare your new
demangler with the fast one and see if we can get rid of the fast demangler now.
I think a good way of making this even faster is to have LLDB consume the AST
the demangler produces directly. The AST is a better representation of the
information that LLDB wants, and finishing the demangle and then fishing out
that information from the output string is unfortunate. From the AST, it would
be really straightforward to just individually print all the components of the
name that LLDB wants.
This would help us to grab the important bits out of the mangled name as well. We chop up a demangled name to find the
base name (string for std::string), containing context (std:: for std::string) and we check if we can tell if the
function is a method (look for trailing "const" modifier on the function) versus a top level function (since
the mangling doesn't fully specify what is a namespace and what is a class (like in "foo::bar::baz()" we
don't know if "foo" or "bar" are classes or namespaces. So the AST would be great as long as it is
fast.
Most of the time it takes to demangle these "symbols from hell" is during the
printing, after the AST has been parsed, because the demangler has to flatten out all the
potentially nested back references. Just parsing to an AST should be about proportional
to the strlen of the mangled name. Since (AFAIK) LLDB doesn't use some sections of the
demangled name often (such as parameters), from the AST LLDB could lazily decide not to
even bother fully demangling some sections of the name, then if it ever needs them it
could parse a new AST and get them from there. I think this would largely fix the issue,
as most of the time these crazy expansions don't occur in the name itself, but in the
parameters or return type. Even when they do appear in the name, it would be possible to
do some simple name classification (ie, does this symbol refer to a function) or pull out
the basename quickly without expanding anything at all.
Any thoughts? I'm really not at all familiar with LLDB, so I could have this
all wrong!
AST sounds great. We can put this into the class we use to chop us C++ names as
that is really our goal.
So it would be great to do a speed comparison between our fast demangler in
LLDB (in FastDemangle.cpp/.h) and your updated libcxxabi version. If yours is
faster, remove FastDemangle and then update the llvm::ItaniumDemangle() to use
your new code.
ASTs would be great for the C++ name parser,
Let us know what you are thinking,
Hi Greg,
I'll almost finished with my work on the demangler, hopefully I'll be
done within a few weeks. Once that's all finished I'll look into
exporting the AST and comparing it to FastDemangle. I was thinking about
adding a version of llvm::itaniumMangle() that returns a opaque handle
to the AST and defining some functions on the LLVM side that take that
handle and return some extra information. I'd be happy to help out with
the LLDB side of things too, although it might be better if someone more
experienced with LLDB did this.
I'll ping this thread when I'm finished with the demangler, then we can
hopefully work out what a good API for LLDB would be.
Thanks,
Erik
Greg
Thanks,
Erik
On 2018-01-24 6:48 PM, Greg Clayton via lldb-dev wrote:
I have an issue where I am debugging a C++ binary that is around 250MB in size.
It contains some mangled names that are crazy:
_ZNK3shk6detail17CallbackPublisherIZNS_5ThrowERKNSt15__exception_ptr13exception_ptrEEUlOT_E_E9SubscribeINS0_9ConcatMapINS0_18CallbackSubscriberIZNS_6GetAllIiNS1_IZZNS_9ConcatMapIZNS_6ConcatIJNS1_IZZNS_3MapIZZNS_7IfEmptyIS9_EEDaS7_ENKUlS6_E_clINS1_IZZNS_4TakeIiEESI_S7_ENKUlS6_E_clINS1_IZZNS_6FilterIZNS_9ElementAtEmEUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZZNSL_ImEESI_S7_ENKUlS6_E_clINS1_IZNS_4FromINS0_22InfiniteRangeContainerIiEEEESI_S7_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EESI_S7_ENKUlS6_E_clIS14_EESI_S6_EUlS7_E_EERNS1_IZZNSH_IS9_EESI_S7_ENKSK_IS14_EESI_S6_EUlS7_E0_EEEEESI_DpOT_EUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZNS_5StartIJZNS_4JustIJS19_S1C_EEESI_S1F_EUlvE_ZNS1K_IJS19_S1C_EEESI_S1F_EUlvE0_EEESI_S1F_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESt6vectorIS6_SaIS6_EERKT0_NS_12ElementCountEbEUlS7_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlOS3_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlvE_EES1G_S1O_E25ConcatMapValuesSubscriberEEEDaS7_
This de-mangles to something that is 72MB in size and takes 280 seconds (try running
"time c++filt -n" on the above string).
There are probably many symbols likes this in this binary. Currently lldb will
de-mangle all names in the symbol table so that we can chop up the names so we
know function base names and we might be able to classify a base name as a
method or function for breakpoint categorization.
My questions is: how do we work around such issues in LLDB? A few solutions I
can think of:
1 - time each name demangle and if it takes too long somehow stop de-mangling
similar symbols or symbols over a certain length?
2 - allow a setting that says "don't de-mangle names that start with..." and
the setting has a list of prefixes.
3 - have a setting that turns off de-mangling symbols over a certain length all
of the time with a default of something like 256 or 512
4 - modify our FastDemangler to abort if the de-mangled string goes over a
certain limit to avoid bad cases like this...
#1 would still mean we get a huge delay (like 280 seconds) when starting to
debug this binary, but might prevent multiple symbols from adding to that
delay...
#2 would require debugging debugging once and then knowing which symbols took a
while to de-mangle. If we time each de-mangle, we can warn that there are large
mangled names and print the mangled name so the user might know?
#3 would disable de-mangling of long names at the risk of not de-mangling names
that are close to the limit
#4 requires that our FastDemangle code can decode the string mangled string.
The fast de-mangler currently aborts on tricky de-mangling and we fall back
onto cxa_demangle from the C++ library which doesn't not have a cutoff on
length...
Can anyone else think of any other solutions?
Greg Clayton
_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev