> On Feb 27, 2019, at 10:12 AM, Zachary Turner <ztur...@google.com> wrote:
> 
> 
> 
> On Tue, Feb 26, 2019 at 5:39 PM Frédéric Riss <fr...@apple.com 
> <mailto:fr...@apple.com>> wrote:
> 
>> On Feb 26, 2019, at 4:52 PM, Zachary Turner <ztur...@google.com 
>> <mailto:ztur...@google.com>> wrote:
>> 
>> 
>> 
>> On Tue, Feb 26, 2019 at 4:49 PM Frédéric Riss <fr...@apple.com 
>> <mailto:fr...@apple.com>> wrote:
>> 
>>> On Feb 26, 2019, at 4:03 PM, Zachary Turner <ztur...@google.com 
>>> <mailto:ztur...@google.com>> wrote:
>>> 
>>> I would probably build the server by using mostly code from LLVM.  Since it 
>>> would contain all of the low level debug info parsing libraries, i would 
>>> expect that all knowledge of debug info (at least, in the form that 
>>> compilers emit it in) could eventually be removed from LLDB entirely.
>> 
>> That’s quite an ambitious goal.
>> 
>> I haven’t looked at the SymbolFile API, what do you expect the exchange 
>> currency between the server and LLDB to be? Serialized compiler ASTs? If 
>> that’s the case, it seems like you need a strong rev-lock between the server 
>> and the client. Which in turn add quite some complexity to the rollout of 
>> new versions of the debugger.
>> Definitely not serialized ASTs, because you could be debugging some language 
>> other than C++.  Probably something more like JSON, where you parse the 
>> debug info and send back some JSON representation of the type / function / 
>> variable the user requested, which can almost be a direct mapping to LLDB's 
>> internal symbol hierarchy (e.g. the Function, Type, etc classes).  You'd 
>> still need to build the AST on the client
> 
> This seems fairly easy for Function or symbols in general, as it’s easy to 
> abstract their few properties, but as soon as you get to the type system, I 
> get worried.
> 
> Your representation needs to have the full expressivity of the underlying 
> debug info format. Inventing something new in that space seems really 
> expensive. For example, every piece of information we add to the debug info 
> in the compiler would need to be handled in multiple places:
>  - the server code
>  - the client code that talks to the server
>  - the current “local" code (for a pretty long while)
> Not ideal. I wish there was a way to factor at least the last 2. 
> How often does this actually happen though?  The C++ type system hasn't 
> really undergone very many fundamental changes over the years.

I think over the last year we’ve done at least a couple extensions to what we 
put in DWARF (for ObjC classes and ARM PAC support which is not upstream yet). 
Adrian usually does those evolutions, so he might have a better idea. We plan 
on potentially adding a bunch more information to DWARF to more accurately 
represent the Obj-C type system.  

>   I mocked up a few samples of what some JSON descriptions would look like, 
> and it didn't seem terrible.  It certainly is some work -- there's no denying 
> -- but I think a lot of the "expressivity" of the underlying format is 
> actually more accurately described as "flexibility".  What I mean by this is 
> that there are both many different ways to express the same thing, as well as 
> many entities that can express different things depending on how they're 
> used.  An intermediate format gives us a way to eliminate all of that 
> flexibility and instead offer consistency, which makes client code much 
> simpler.  In a way, this is a similar benefit to what one gets by compiling a 
> source language down to LLVM IR and then operating on the LLVM IR because you 
> have a much simpler grammar to deal with, along with more semantic 
> restrictions on what kind of descriptions you form with that grammar (to be 
> clear: JSON itself is not restrictive, but we can make our schema 
> restrictive).

What I’m worried about is not exactly the amount of work, just the scope of the 
new abstraction. It needs to be good enough for any language and any debug 
information format. It needs efficient implementation of at least symbols, 
types, decl contexts, frame information, location expressions, target register 
mappings... And it’ll require the equivalent of the various ASTParser 
implementations. That’s a lot of new and forked code. I’d feel way better if we 
were able to reuse some of the existing code. I’m not sure how feasible this is 
though.

> For what it's worth, in an earlier message I mentioned that I would probably 
> build the server by using mostly code from LLVM, and making sure that it 
> supported the union of things currently supported by LLDB and LLVM's DWARF 
> parsers.  Doing that would naturally require merging the two (which has been 
> talked about for a long time) as a pre-requisite, and I would expect that for 
> testing purposes we might want something like llvm-dwarfdump but that dumps a 
> higher level description of the information (if we change our DWARF emission 
> code in LLVM for example, to output the exact same type in slightly different 
> ways in the underlying DWARF, we wouldn't want our test to break, for 
> example).  So for example imagine you could run something like 
> `lldb-dwarfdump -lookup-type=foo a.out` and it would dump some description of 
> the type that is resilient to insignificant changes in the underlying DWARF.

At which level do you consider the “DWARF parser” to stop and the debugger 
policy to start? In my view, the DWARF parser stop at the DwarfDIE boundary. 
Replacing it wouldn’t get us closer to a higher-level abstraction.

> At that point you're already 90% of the way towards what I'm proposing, and 
> it's useful independently.


I think that “90%” figure is a little off :-) But please don’t take my 
questions as opposition to the general idea. I find the idea very interesting, 
and we could maybe use something similar internally so I am interested. That’s 
why I’m asking questions.

Fred

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Reply via email to