[lldb-dev] Resolving dynamic type based on RTTI fails in case of type names inequality in DWARF and mangled symbols

2017-12-15 Thread xgsa via lldb-dev
Hi, I am working on issue that in C++ program for some complex cases with templates showing dynamic type based on RTTI in lldb doesn't work properly. Consider the following example:enum class TagType : bool {
    Tag1
};

struct I {
    virtual ~I() = default;
};

template 
struct Impl : public I {
    private:
    int v = 123;
};

int main(int argc, const char * argv[]) {
    Impl impl;
    I& i = impl;
    return 0;
}For this example clang generates type name "Impl" in DWARF and "__ZTS4ImplIL7TagType0EE" when mangling symbols (which lldb demangles to Impl<(TagType)0>). Thus when in ItaniumABILanguageRuntime::GetTypeInfoFromVTableAddress() lldb tries to resolve the type, it is unable to find it. More cases and the detailed description why lldb fails here can be found in this clang review, which tries to fix this in clang [1]. However, during the discussion around this review [2], it was pointed out that DWARF names are expected to be close to sources, which clang does perfectly, whereas mangling algorithm is strictly defined. Thus matching them on equality could sometimes fail. The suggested idea in [2] was to implement more semantically aware matching. There is enough information in the DWARF to semantically match "Impl<(TagType)0>)" with "Impl", as enum TagType is in the DWARF, and the enumerator Tag1 is present with its value 0. I have some concerns about the performance of such solution, but I'd like to know your opinion about this idea in general. In case it is approved, I'm going to work on implementing it. So what do you think about type names inequality and the suggested solution? [1] - https://reviews.llvm.org/D39622[2] - http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20171211/212859.html Thank you,Anton.___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] Resolving dynamic type based on RTTI fails in case of type names inequality in DWARF and mangled symbols

2017-12-15 Thread xgsa via lldb-dev
Sorry, I probably shouldn't have used HTML for that message. Converted to plain 
text.

 Original message 
15.12.2017, 18:01, "xgsa" :

Hi,

I am working on issue that in C++ program for some complex cases with templates 
showing dynamic type based on RTTI in lldb doesn't work properly. Consider the 
following example:
enum class TagType : bool
{
Tag1
};

struct I
{
virtual ~I() = default;
};

template 
struct Impl : public I
{
private:
int v = 123;
};

int main(int argc, const char * argv[]) {
Impl impl;
I& i = impl;
return 0;
}

For this example clang generates type name "Impl" in DWARF and 
"__ZTS4ImplIL7TagType0EE" when mangling symbols (which lldb demangles to 
Impl<(TagType)0>). Thus when in 
ItaniumABILanguageRuntime::GetTypeInfoFromVTableAddress() lldb tries to resolve 
the type, it is unable to find it. More cases and the detailed description why 
lldb fails here can be found in this clang review, which tries to fix this in 
clang [1].

However, during the discussion around this review [2], it was pointed out that 
DWARF names are expected to be close to sources, which clang does perfectly, 
whereas mangling algorithm is strictly defined. Thus matching them on equality 
could sometimes fail. The suggested idea in [2] was to implement more 
semantically aware matching. There is enough information in the DWARF to 
semantically match "Impl<(TagType)0>)" with "Impl", as enum 
TagType is in the DWARF, and the enumerator Tag1 is present with its value 0. I 
have some concerns about the performance of such solution, but I'd like to know 
your opinion about this idea in general. In case it is approved, I'm going to 
work on implementing it.

So what do you think about type names inequality and the suggested solution?

[1] - https://reviews.llvm.org/D39622
[2] - 
http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20171211/212859.html

Thank you,
Anton.
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Resolving dynamic type based on RTTI fails in case of type names inequality in DWARF and mangled symbols

2017-12-18 Thread xgsa via lldb-dev
Thank you for clarification, Jim, you are right, I misunderstood a little bit 
what lldb actually does.

It is not that the compiler can't be fixed, it's about the fact that relying on 
correspondence of mangled and demangled forms are not reliable enough, so we 
are looking for more robust alternatives. Moreover, I am not sure that such 
fuzzy matching could be done just basing on class name, so it will require 
reading more DIEs. Taking into account that, for instance, in our project there 
are quite many such types, it could noticeable slow down the debugger.

Thus, I'd like to mention one more alternative and get your feedback, if 
possible. Actually, what is necessary is the correspondence of mangled and 
demangled vtable symbol. Possibly, it worth preparing a separate section during 
compilation (like e.g. apple_types), which would store this correspondence? It 
will work fast and be more reliable than the current approach, but certainly, 
will increase debug info size (however, cannot estimate which exact increase 
will be, e.g. in persent).

What do you think? Which solution is preferable?

Thanks,
Anton.

15.12.2017, 23:34, "Jim Ingham" :
> First off, just a technical point. lldb doesn't use RTTI to find dynamic 
> types, and in fact works for projects like lldb & clang that turn off RTTI. 
> It just uses the fact that the vtable symbol for an object demangles to:
>
> vtable for CLASSNAME
>
> That's not terribly important, but I just wanted to make sure people didn't 
> think lldb was doing something fancy with RTTI... Note, gdb does (or at least 
> used to do) dynamic detection the same way.
>
> If the compiler can't be fixed, then it seems like your solution [2] is what 
> we'll have to try.
>
> As it works now, we get the CLASSNAME from the vtable symbol and look it up 
> in the the list of types. That is pretty quick because the type names are 
> indexed, so we can find it with a quick search in the index. Changing this 
> over to a method where we do some additional string matching rather than just 
> using the table's hashing is going to be a fair bit slower because you have 
> to run over EVERY type name. But this might not be that bad. You would first 
> look it up by exact CLASSNAME and only fall back on your fuzzy match if this 
> fails, so most dynamic type lookups won't see any slowdown. And if you know 
> the cases where you get into this problem you can probably further restrict 
> when you need to do this work so you don't suffer this penalty for every 
> lookup where we don't have debug info for the dynamic type. And you could 
> keep a side-table of mangled-name -> DWARF name, and maybe a black-list for 
> unfound names, so you only have to do this once.
>
> This estimation is based on the assumption that you can do your work just on 
> the type names, without having to get more type information out of the DWARF 
> for each candidate match. A solution that relies on realizing every class in 
> lldb so you can get more information out of the type information to help with 
> the match will defeat all our attempts at lazy DWARF reading. This can cause 
> quite long delays in big programs. So I would be much more worried about a 
> solution that requires this kind of work. Again, if you can reject most 
> potential candidates by looking at the name, and only have to realize a few 
> likely types, the approach might not be that slow.
>
> Jim
>
>>  On Dec 15, 2017, at 7:11 AM, xgsa via lldb-dev  
>> wrote:
>>
>>  Sorry, I probably shouldn't have used HTML for that message. Converted to 
>> plain text.
>>
>>   Original message 
>>  15.12.2017, 18:01, "xgsa" :
>>
>>  Hi,
>>
>>  I am working on issue that in C++ program for some complex cases with 
>> templates showing dynamic type based on RTTI in lldb doesn't work properly. 
>> Consider the following example:
>>  enum class TagType : bool
>>  {
>> Tag1
>>  };
>>
>>  struct I
>>  {
>> virtual ~I() = default;
>>  };
>>
>>  template 
>>  struct Impl : public I
>>  {
>>  private:
>> int v = 123;
>>  };
>>
>>  int main(int argc, const char * argv[]) {
>> Impl impl;
>> I& i = impl;
>> return 0;
>>  }
>>
>>  For this example clang generates type name "Impl" in DWARF 
>> and "__ZTS4ImplIL7TagType0EE" when mangling symbols (which lldb demangles to 
>> Impl<(TagType)0>). Thus when in 
>> ItaniumABILanguageRuntime::GetTypeInfoFromVTableAddress() lldb tries to 
>> resolve the type, it is unable to find it. More cases

Re: [lldb-dev] Resolving dynamic type based on RTTI fails in case of type names inequality in DWARF and mangled symbols

2017-12-18 Thread xgsa via lldb-dev
Hi Tamas, First, why DW_AT_MIPS_linkage_name, but not just DW_AT_linkage_name? The later is standartized and currently generated by clang at least on x64. Second, this doesn't help to solve the issue, because this will require parsing all the DWARF types during startup to build a map that breaks DWARF lazy load, performed by lldb. Or am I missing something? Thanks,Anton. 18.12.2017, 22:59, "Tamas Berghammer" :Hi Anton and Jim,What do you think about storing the mangled type name or the mangled vtable symbol name somewhere in DWARF in the DW_AT_MIPS_linkage_name attribute? We are already doing it for the mangled names of functions so extending it to types shouldn't be too controversial.Tamas On Mon, 18 Dec 2017, 17:29 xgsa via lldb-dev, <lldb-dev@lists.llvm.org> wrote:Thank you for clarification, Jim, you are right, I misunderstood a little bit what lldb actually does.It is not that the compiler can't be fixed, it's about the fact that relying on correspondence of mangled and demangled forms are not reliable enough, so we are looking for more robust alternatives. Moreover, I am not sure that such fuzzy matching could be done just basing on class name, so it will require reading more DIEs. Taking into account that, for instance, in our project there are quite many such types, it could noticeable slow down the debugger.Thus, I'd like to mention one more alternative and get your feedback, if possible. Actually, what is necessary is the correspondence of mangled and demangled vtable symbol. Possibly, it worth preparing a separate section during compilation (like e.g. apple_types), which would store this correspondence? It will work fast and be more reliable than the current approach, but certainly, will increase debug info size (however, cannot estimate which exact increase will be, e.g. in persent).What do you think? Which solution is preferable?Thanks,Anton.15.12.2017, 23:34, "Jim Ingham" <jing...@apple.com>:> First off, just a technical point. lldb doesn't use RTTI to find dynamic types, and in fact works for projects like lldb & clang that turn off RTTI. It just uses the fact that the vtable symbol for an object demangles to:>> vtable for CLASSNAME>> That's not terribly important, but I just wanted to make sure people didn't think lldb was doing something fancy with RTTI... Note, gdb does (or at least used to do) dynamic detection the same way.>> If the compiler can't be fixed, then it seems like your solution [2] is what we'll have to try.>> As it works now, we get the CLASSNAME from the vtable symbol and look it up in the the list of types. That is pretty quick because the type names are indexed, so we can find it with a quick search in the index. Changing this over to a method where we do some additional string matching rather than just using the table's hashing is going to be a fair bit slower because you have to run over EVERY type name. But this might not be that bad. You would first look it up by exact CLASSNAME and only fall back on your fuzzy match if this fails, so most dynamic type lookups won't see any slowdown. And if you know the cases where you get into this problem you can probably further restrict when you need to do this work so you don't suffer this penalty for every lookup where we don't have debug info for the dynamic type. And you could keep a side-table of mangled-name -> DWARF name, and maybe a black-list for unfound names, so you only have to do this once.>> This estimation is based on the assumption that you can do your work just on the type names, without having to get more type information out of the DWARF for each candidate match. A solution that relies on realizing every class in lldb so you can get more information out of the type information to help with the match will defeat all our attempts at lazy DWARF reading. This can cause quite long delays in big programs. So I would be much more worried about a solution that requires this kind of work. Again, if you can reject most potential candidates by looking at the name, and only have to realize a few likely types, the approach might not be that slow.>> Jim>>>  On Dec 15, 2017, at 7:11 AM, xgsa via lldb-dev <lldb-dev@lists.llvm.org> wrote:>>>>  Sorry, I probably shouldn't have used HTML for that message. Converted to plain text.>>>>   Original message >>  15.12.2017, 18:01, "xgsa" <x...@yandex.ru>:>>>>  Hi,>>>>  I am working on issue that in C++ program for some complex cases with templates showing dynamic type based on RTTI in lldb doesn't work properly. Consider the following example:>>  enum class TagType : bool>>  {>> Tag1>>  };>>>>  struct I>>  {>> virtual ~I() = default;>>  };>>>>  template >

Re: [lldb-dev] Resolving dynamic type based on RTTI fails in case of type names inequality in DWARF and mangled symbols

2017-12-21 Thread xgsa via lldb-dev
21.12.2017, 13:45, "Pavel Labath via lldb-dev" :
> On 20 December 2017 at 18:40, Greg Clayton  wrote:
>>>  On Dec 20, 2017, at 3:33 AM, Pavel Labath  wrote:
>>>
>>>  On 19 December 2017 at 17:39, Greg Clayton via lldb-dev
>>>   wrote:
  The apple accelerator tables are only enabled for Darwin target, but there
  is nothing to say we couldn't enable these for other targets in ELF files.
  It would be a quick way to gauge the performance improvement that these
  accelerator tables provide for linux.
>>>
>>>  I was actually experimenting with this last month. Unfortunately, I've
>>>  learned that the situation is not as simple as flipping a switch in
>>>  the compiler. In fact, there is no switch to flip as clang will
>>>  already emit the apple tables if you pass -glldb. However, the
>>>  resulting tables will be unusable due to the differences in how dwarf
>>>  is linked on elf vs mach-o. In elf, we have the linker concatenate the
>>>  debug info into the final executable/shared library, which it will
>>>  also happily do for the .apple_*** sections.
>>
>>  That ruins the whole idea of the accelerator tables if they are 
>> concatenated...
>
> I'm not sure I'm convinced by that. I mean, obviously it's better if
> you have just a single table to look up, but even if you have multiple
> tables, looking up into each one may be faster that indexing the full
> debug info yourself. Take liblldb for example. It has ~3000 compile
> units and nearly 2GB of debug info. I don't have any solid data on
> this (and it would certainly be interesting to make this experiment),
> but I expect that doing 3000 hash lookups (which are basically just
> array accesses) would be faster than indexing 2GB of dwarf (where you
> have to deal with variable-sized fields and uleb encodings...). And
> there is always the possibility to do the lookups in parallel or merge
> the individual tables inside the debugger.
>
>>>  The second, more subtle problem I see is that these tables are an
>>>  all-or-nothing event. If we see an accelerator table, we assume it is
>>>  an index of the entire module, but that's not likely to be the case,
>>>  especially in the early days of this feature's uptake. You will have
>>>  people feeding the linkers with output from different compilers, some
>>>  of which will produce these tables, and some not. Then the users will
>>>  be surprised that the debugger is ignoring some of their symbols.
>>
>>  I think it is best to auto generate the tables from the DWARF directly 
>> after it has all been linked. Skip teaching the linker about merging it, 
>> just teach it to generate it.
>
> If the linker does the full generation, then how is that any better
> than doing the indexing in the debugger? Somebody still has to parse
> the entire dwarf, so it might as well be the debugger. 

I suppose, the difference is that linker does it one time and debugger has to 
do it every time on startup, as the results are not saved anywhere (or are 
they?). So possibly, instead of building accelerator tables by compiler for 
debugger, possibly, the debugger should save its own indexes somewhere (e.g. in 
a cache-file near the binary)? Or is there already such mechanism and I just 
don't know about it?

> I think the
> main advantage of doing it in the compiler is that the compiler
> already has all the data about what should go into the index ready, so
> it can just build it as it goes about writing out the object file.
> Then, the merging should be a relatively simple and fast operation
> (and the linker does not even have to know how to parse dwarf). Isn't
> this how the darwin workflow works already?
>
>>>  This is probably a bit more work than just "flipping a switch", but I
>>>  hope it will not be too much work. The layout and contents of the
>>>  tables are generally the same, so I am hoping most of the compiler
>>>  code for the apple tables can be reused for the dwarf5 tables. If
>>>  things turn out they way I want them to, I'll be able to work on
>>>  getting this done next year.
>>
>>  Modifying llvm-dsymutil to handle ELF so we can use "llvm-dsymutil --update 
>> foo.elf" is the quickest way that doesn't involve modifying anything but 
>> llvm-dsymutil. It will generate the accelerator tables manually and 
>> add/modify the existing accelerator tables and write out the new elf file 
>> that is all fixed up. I would suggest going this route at first to see what 
>> performance improvements we will see with linux so that can drive how 
>> quickly we need to adopt this.
>
> I'm not sure now whether you're suggesting to use the dsymutil
> approach just to gauge the potential speedup we can obtain and get
> people interested, or as a productized solution. If it's the first one
> then I fully agree with you. Although I think I can see an even
> simpler way to estimate the speedup: build lldb for mac with apple
> indexes disabled and compare its performance to a vanilla one. I'm
> going to see if I can get so