clayborg added a comment.

In D67390#1672210 <https://reviews.llvm.org/D67390#1672210>, @kwk wrote:

> @clayborg thank you for this explanation. My patch for minidebuginfo is 
> already done in D66791 <https://reviews.llvm.org/D66791> . But this one here 
> I was asked to pull out as a separate patch. For good reasons as we see. I 
> wonder how this extra parameter `SymbolMapType` of yours does help. In the 
> end I have to identify duplicates. But if no symbol with the same **name** 
> should be added then why do I care about where the symbol is coming from?


The uniqueness is for symbols with the same name and file address and size. You 
can have multiple symbols with the same name, and each on could have a 
different address. We want there to only be one symbol per ObjectFile that has 
the same name + addr + size. That way when we ask for symbols by name, we don't 
have to end up getting more than one symbol for something that is the same 
thing.

> Please help me understand of follow my thoughts here:
> 
> When I'm in the `(lldb)` console and type `b foo` I expect LLDB to set a 
> breakpoint on the **function** foo, right? The type of the symbol `foo` is 
> deduced as **function**.

Breakpoints ask for symbols whose type is lldb::eSymbolTypeCode and that match 
the name. The name matching is much more complex than you would think though. 
"b foo" by default turns into 'set a breakpoint on functions whose "basename" 
matches foo'. This means a C function named 'foo', any C++ method (stand alone 
function or class method) whose basename is 'foo' (bar::foo(int)", "foo(int)", 
"foo(float)", "std::a::b::foo()", many more) and any objective C function whose 
selector is 'foo' ("-[MyClass foo]", "+[AnotherClass foo]", and any other 
basename from any other language.

If you type "b foo::bar" this will end up looking up all functions whose 
basename is "bar" and then making sure any found matches contain "foo::bar".

> I ask this question because `Symtab` has no function to just search for a 
> symbol by name; instead you always have to pass an address, a type or an ID:
> 
>   Symbol *FindSymbolByID(lldb::user_id_t uid) const;
>   Symbol *FindSymbolWithType(lldb::SymbolType symbol_type,
>   size_t FindAllSymbolsWithNameAndType(ConstString name,
>   size_t FindAllSymbolsWithNameAndType(ConstString name,
>   size_t FindAllSymbolsMatchingRexExAndType(
>   Symbol *FindFirstSymbolWithNameAndType(ConstString name,
>   Symbol *FindSymbolAtFileAddress(lldb::addr_t file_addr);
>   Symbol *FindSymbolContainingFileAddress(lldb::addr_t file_addr);
>   size_t FindFunctionSymbols(ConstString name, uint32_t name_type_mask,
>    

FindAllSymbolsWithNameAndType() takes a name. The symbol type can be 
lldb::eSymbolTypeAny, so yes there is way to search. So there is a way to 
search for a symbol by name. The main issue is we are still parsing the symbol 
table and we don't want to initialized the name lookup table in the symbol 
table just yet since we are still adding symbols to the complete list. This is 
why an extra map that is used only during parsing of the symbol table makes 
sense.

> So my point of this whole question is: What makes a symbol unique in the 
> sense that it shouldn't be added to the symtab if it is already there?

Symbol  is unique within one representation of an ObjectFile where the symbol 
has the same name, address and size and type.

> Shouldn't the type of the symbol together with it's name define uniqueness? 
> We shouldn't care about where the symbol is coming from nor if it is located 
> at a different address.

We don't care where the symbol comes from as long as it is representing 
information for one ObjectFile. I would contend that if you have an "a.out" 
binary that has a .gnu_debugdata that points to "a.out.gnu_debugdata" that we 
would have one single ObjectFile that represents "a.out" and give the best 
description of what "a.out" contains.

We do care if a symbol has a different address. You can have as many static 
functions called "foo" as you want in a single binary. They are each unique 
since they have different addresses. So if you have 10 source files where 3 of 
those sources have a symbol "foo", we want there to be 3 different symbols with 
the same name, different addresses and possibly different sizes.

> Well, if there's an overloaded function `foo(int)` and `foo(char*)` then both 
> symbols are of type **function** and they both share the same **name**. When 
> you type `b foo` you DO want 2 breakpoints to be set.

Yes we do! One breakpoint in LLDB has N breakpoint locations. A breakpoint of 
any kind (source file and line, function name breakpoint, and more) all 
constantly adding and removing locations as shared libraries get loaded. So if 
you have "foo(int)" and "foo(char *)" and say "b foo" we would end up with one 
breakpoint whose name matches "foo" with two locations.

> Hence, niqueness cannot be defined over the name and the type . But wait, the 
> **name** is mangled, so it IS unique enough unless I use 
> `Symbol::GetNameNoArguments()`; there only the name is returned.

Again, there can be N breakpoints with the same name and different addresses. 
We are just trying to avoid a symbol table that has 3 symbols all with the name 
"foo", address of 0x1000 and size of 0x10. Why? Because the information is 
redundant and is just noise. The map I suggested would track these symbols so 
we don't end up adding multiple of the same symbols. Again, name address and 
size must all match. We _can_ have multiple symbols at the same address with 
different names. How? Linkers often have aliased symbols that point to the same 
thing. if we have a symbol "foo" at 0x1000 with size 0x10, we can also have a 
symbol "foo_alias" at 0x1000 and size 0x10. We need both so that we can set 
breakpoints correctly when setting breakpoints by name.

> Here's my naive approach to test the admittedly very weird thought process 
> from above: 
> https://github.com/kwk/llvm-project/commit/5da4559a00c73ebefd8f8199890bd1991c94fa3f

I will take a look.

Let me know if you have any questions about what I said above.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67390/new/

https://reviews.llvm.org/D67390



_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to