Re: [lldb-dev] Huge mangled names are causing long delays when loading symbol table symbols

Erik Pilkington via lldb-dev Mon, 19 Mar 2018 18:51:13 -0700

I've put a WIP patch up here: https://reviews.llvm.org/D44668
Sorry for the delay!
Erik


On 2018-01-26 3:56 PM, Greg Clayton wrote:

On Jan 26, 2018, at 8:38 AM, Erik Pilkington<erik.pilking...@gmail.com <mailto:erik.pilking...@gmail.com>> wrote:
On 2018-01-25 1:58 PM, Greg Clayton wrote:
On Jan 25, 2018, at 10:25 AM, Erik Pilkington<erik.pilking...@gmail.com <mailto:erik.pilking...@gmail.com>> wrote:
Hi,
I'm not at all familiar with LLDB, but I've been doing some work onthe demangler in libcxxabi. It's still a work in progress and Ihaven't yet copied the changes over to ItaniumDemangle, which AFAIKis what lldb uses. The demangler in libcxxabi now demangles thesymbol you attached in 3.31 seconds, instead of 223.54 on mymachine. I posted a RFC on my work here(http://lists.llvm.org/pipermail/llvm-dev/2017-June/114448.html),but basically the new demangler just produces an AST then traversesit to print the demangled name.
Great to hear the huge speedup in demangling! LLDB actually has twodemanglers: a fast one that can demangle 99% of names, and we fallback to ItaniumDemangle which can do all names but is really slow.It would be fun to compare your new demangler with the fast one andsee if we can get rid of the fast demangler now.
I think a good way of making this even faster is to have LLDBconsume the AST the demangler produces directly. The AST is abetter representation of the information that LLDB wants, andfinishing the demangle and then fishing out that information fromthe output string is unfortunate. From the AST, it would be reallystraightforward to just individually print all the components ofthe name that LLDB wants.
This would help us to grab the important bits out of the mangledname as well. We chop up a demangled name to find the base name(string for std::string), containing context (std:: for std::string)and we check if we can tell if the function is a method (look fortrailing "const" modifier on the function) versus a top levelfunction (since the mangling doesn't fully specify what is anamespace and what is a class (like in "foo::bar::baz()" we don'tknow if "foo" or "bar" are classes or namespaces. So the AST wouldbe great as long as it is fast.
Most of the time it takes to demangle these "symbols from hell" isduring the printing, after the AST has been parsed, because thedemangler has to flatten out all the potentially nested backreferences. Just parsing to an AST should be about proportional tothe strlen of the mangled name. Since (AFAIK) LLDB doesn't use somesections of the demangled name often (such as parameters), from theAST LLDB could lazily decide not to even bother fully demanglingsome sections of the name, then if it ever needs them it couldparse a new AST and get them from there. I think this would largelyfix the issue, as most of the time these crazy expansions don'toccur in the name itself, but in the parameters or return type.Even when they do appear in the name, it would be possible to dosome simple name classification (ie, does this symbol refer to afunction) or pull out the basename quickly without expandinganything at all.
Any thoughts? I'm really not at all familiar with LLDB, so I couldhave this all wrong!
AST sounds great. We can put this into the class we use to chop usC++ names as that is really our goal.
So it would be great to do a speed comparison between our fastdemangler in LLDB (in FastDemangle.cpp/.h) and your updatedlibcxxabi version. If yours is faster, remove FastDemangle and thenupdate the llvm::ItaniumDemangle() to use your new code.
ASTs would be great for the C++ name parser,

Let us know what you are thinking,
Hi Greg,
I'll almost finished with my work on the demangler, hopefully I'll bedone within a few weeks. Once that's all finished I'll look intoexporting the AST and comparing it to FastDemangle. I was thinkingabout adding a version of llvm::itaniumMangle() that returns a opaquehandle to the AST and defining some functions on the LLVM side thattake that handle and return some extra information. I'd be happy tohelp out with the LLDB side of things too, although it might bebetter if someone more experienced with LLDB did this.
Can't wait! The only reason we switched away from the libcxxabidemangler in the first place was the poor performance. GDB's demanglerwas 3x faster. Our FastDemangler made got back to the speed of the GDBdemangler. But it will be great to get back to one fast demangler.
It would be great if there was some way to implement the demangledname size cutoff in the demangler where if the detangled names goesover some max size we can just stop demangling. No one needs to see a72MB string, not would anyone ever type in that name.
If you can get the new demangler features (AST + demangling) intollvm::itaniumMangle I will be happy to do the LLDB side of the work
I'll ping this thread when I'm finished with the demangler, then wecan hopefully work out what a good API for LLDB would be.
It would be great to put all the functionality into LLVM and test thefunctionality in llvm tests. Then I will port over to LLDB as needed.As Jim said, we want to know the function basename, if a function is aC++ method or just a top level function or possibly both (we oftendon't know just from mangling if foo::bar() is a method of functionsince we don't know if "foo" is a namespace, but if we have"foo::bar() const", then we know it is a method.
Look forward to seeing what you come up with!

Greg
Thanks,
Erik
Greg
Thanks,
Erik


On 2018-01-24 6:48 PM, Greg Clayton via lldb-dev wrote:
I have an issue where I am debugging a C++ binary that is around250MB in size. It contains some mangled names that are crazy:
_ZNK3shk6detail17CallbackPublisherIZNS_5ThrowERKNSt15__exception_ptr13exception_ptrEEUlOT_E_E9SubscribeINS0_9ConcatMapINS0_18CallbackSubscriberIZNS_6GetAllIiNS1_IZZNS_9ConcatMapIZNS_6ConcatIJNS1_IZZNS_3MapIZZNS_7IfEmptyIS9_EEDaS7_ENKUlS6_E_clINS1_IZZNS_4TakeIiEESI_S7_ENKUlS6_E_clINS1_IZZNS_6FilterIZNS_9ElementAtEmEUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZZNSL_ImEESI_S7_ENKUlS6_E_clINS1_IZNS_4FromINS0_22InfiniteRangeContainerIiEEEESI_S7_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EESI_S7_ENKUlS6_E_clIS14_EESI_S6_EUlS7_E_EERNS1_IZZNSH_IS9_EESI_S7_ENKSK_IS14_EESI_S6_EUlS7_E0_EEEEESI_DpOT_EUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZNS_5StartIJZNS_4JustIJS19_S1C_EEESI_S1F_EUlvE_ZNS1K_IJS19_S1C_EEESI_S1F_EUlvE0_EEESI_S1F_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESt6vectorIS6_SaIS6_EERKT0_NS_12ElementCountEbEUlS7_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlOS3_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlvE_EES1G_S1O_E25ConcatMapValuesSubscriberEEEDaS7_
This de-mangles to something that is 72MB in size and takes 280seconds (try running "time c++filt -n" on the above string).
There are probably many symbols likes this in this binary.Currently lldb will de-mangle all names in the symbol table sothat we can chop up the names so we know function base names andwe might be able to classify a base name as a method or functionfor breakpoint categorization.
My questions is: how do we work around such issues in LLDB? A fewsolutions I can think of:1 - time each name demangle and if it takes too long somehow stopde-mangling similar symbols or symbols over a certain length?2 - allow a setting that says "don't de-mangle names that startwith..." and the setting has a list of prefixes.3 - have a setting that turns off de-mangling symbols over acertain length all of the time with a default of something like256 or 5124 - modify our FastDemangler to abort if the de-mangled stringgoes over a certain limit to avoid bad cases like this...
#1 would still mean we get a huge delay (like 280 seconds) whenstarting to debug this binary, but might prevent multiple symbolsfrom adding to that delay...
#2 would require debugging debugging once and then knowing whichsymbols took a while to de-mangle. If we time each de-mangle, wecan warn that there are large mangled names and print the mangledname so the user might know?
#3 would disable de-mangling of long names at the risk of notde-mangling names that are close to the limit
#4 requires that our FastDemangle code can decode the stringmangled string. The fast de-mangler currently aborts on trickyde-mangling and we fall back onto cxa_demangle from the C++library which doesn't not have a cutoff on length...
Can anyone else think of any other solutions?

Greg Clayton






_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org <mailto:lldb-dev@lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] Huge mangled names are causing long delays when loading symbol table symbols

Reply via email to