[Lldb-commits] [PATCH] D78801: [LLDB] Add class ProcessWasm for WebAssembly debugging

Pavel Labath via Phabricator via lldb-commits Thu, 30 Apr 2020 06:47:22 -0700

labath added a comment.

Sorry for the overly long post. I tried to reply to all messages from last 
night. I don't claim that all of my comments are consistent with one another -- 
that's a reflection of the fact that I really don't know what is the right 
solution...

In D78801#2010713 <https://reviews.llvm.org/D78801#2010713>, @paolosev wrote:

> Also, to eliminate `qWasmCallStack` we could maybe add the call stack (a list 
> of PCs) to the stop packet. The format of a stop packet is:
>
>   T AA n1:r1;n2:r2;…
>  
>   The program received signal number AA (a two-digit hexadecimal number). 
> [...] Each ‘n:r’ pair is interpreted as follows:
>   If n is a hexadecimal number, it is a register number, and the 
> corresponding r gives that register’s value. [...]
>   If n is ‘thread’, then r is the thread-id of the stopped thread, as 
> specified in thread-id syntax.
>   If n is ‘core’, then r is the hexadecimal number of the core on which the 
> stop event was detected.
>   Otherwise, GDB should ignore this ‘n:r’ pair and go on to the next; this 
> allows us to extend the protocol in the future.
>
>
> So adding a 'stack:xxxxx...xxx' pair should be possible. If we reuse `m`, `M` 
> in place of qWasmLocal, qWasmGlobal and qWasmStackValue and add generic 
> qMemSpaceRead/qMemSpaceWrite for the memory, then there would be no new 
> Wasm-specific command to add to the protocol.

I don't see a real difference between these two options. The only reason the 
`stack:` key is not wasm-specific is because you excluded wasm from the name. 
If we renamed `qWasmCallStack` to `qCallStack`, the effect would be the same. 
For me the question is more fundamendal -- who/how/why should be computing the 
call stack, not the exact protocol details.

It may also be interesting to note that the stop-reply packet kind of also 
includes the call stack information via the `memory` field. The idea there is 
that the stub will walk the FP chain and send over the relevant bits of memory. 
However, there is a big difference -- all of this is just a hint to the client 
and an optimization to reduce packet count. It's still the client who 
determines the final call stack.

In D78801#2010598 <https://reviews.llvm.org/D78801#2010598>, @paolosev wrote:

> I really like the idea of defining generic commands for reading and writing 
> memory on systems with multiple address spaces! In fact there is no reason 
> why that should be Wasm-specific.

I know a lot of people are interested in adding address spaces to lldb. But I 
don't think the problem is adding a gdb-remote extension to the `m` packet -- 
that is the easy part. The same thing could be done to the "memory read" 
command in lldb. The interesting stuff begins when you want to go beyond that: 
If the address space is just an opaque number, then what does that mean? What 
can lldb do with such address spaces? How do they map to the address spaces in 
llvm? How do they relate to addresses in object files (no common format has 
support for them)? How about addresses in debug info? etc.

If dwarf had a notion of address spaces then there'd probably be no need for 
DW_OP_WASM_location. If the dwarf committee hasn't been able to come up with a 
unified way to represent address spaces, then I'm not sure we will do better. 
And we'll still be left with the job of translating dwarf (and other) entries 
into this other concept.

In D78801#2010471 <https://reviews.llvm.org/D78801#2010471>, @paolosev wrote:

> When you say
>
> > translate DW_OP_WASM_location into an appropriate FP+offset combo.
>
> do you mean that LLVM should generate these `FP+offset` combos rather than 
> `DW_OP_WASM_location` or that LLDB should somehow do this translation?

I meant the latter, though if we could somehow achieve the former, it would be 
even better, obviously. :)

> I think the engine can do more to help, here, but not a lot more; I am 
> afraid. Yes, it could expose an implied “SP” and “FP”, and that should be 
> sufficient to represent locals and arguments and make stack walking more 
> orthodox. But DW_OP_WASM_location also describes locations in the set of wasm 
> globals and in the Wasm operand stack, so we would need at least a second. 
> parallel stack to represent the operand stack.
> 
> Also, for C++ LLVM emits code to maintain a “shadow stack” in the linear 
> memory of the module, and location expressions like `DW_OP_fbreg +N` are 
> already used to describe the location of a parameter or a local variable in 
> that shadow stack. The stack frame pointer for that function is described 
> with `DW_AT_frame_base`, expressed as a DW_OP_WASM_location expression.

So, the pointer to this "shadow stack" is one of the function arguments, 
represented as DW_OP_WASM_location 0x0 (local) + constant. And then we get to a 
variable on the shadow stack by "dereferencing" this value, and adding another 
constant. Is that right?

That sound like it should be expressible in standard dwarf. If we set 
DW_AT_frame_base to `DW_OP_regX FP`, then "real" arguments could be expressed 
as `DW_OP_fbreg +Y` and "shadow" arguments as `DW_OP_fbreg 
+index_of_shadow_argument, DW_OP_deref, DW_OP_const Y, DW_OP_add`.
I guess the reason this hasn't been done is because that would give off the 
impression that all of these things are in memory, in the same address space, 
but the "real" arguments don't have a memory location at all? But if they don't 
have a memory location, is it right to represent them as address spaces?

> In the end walking the stack is not a big problem, its logic can already be 
> encapsulated in a Unwind-derived plugin class.

That is true, and there are elements of the current solution there that I like 
a lot. The thing I am resisting is to put all of this stuff in the the 
ProcessGDBRemote class. Instead of trying to generalize it so that it can 
handle everything (and generalize to the wrong thing), I very much like the 
idea of introducing a `WasmProcess` plugin class that handles all wasm stuff. 
If that class happens to use parts of gdb-remote, then so be it, but it means 
that it's not a problem for that class to use a dialect of the gdb-remote 
protocol, which includes as many wasm-specific packets as it needs. Then this 
class can create it's own implementation of the "Unwind" interface, which will 
use some WasmProcess-specific apis to undwind, but that will also be ok, since 
both classes will be wasm-specific.

The question is whether something similar can be done for the other two cases. 
I believe it might be possible for the DWARFExpression. We just need to have a 
way to separate the creation of the dwarf expression data from the process of 
evaluating it. Right now, both of these things happens in SymbolFileDWARF -- it 
creates a DWARFExpression object which both holds the data and knows how to 
evaluate it. But I don't believe SymbolFileDWARF needs the second part. If we 
could make it so that something else is responsible for creating the evaluator 
for dwarf expressions, that something could create a WasmDWARFExpression which 
would know about DW_OP_WASM_location and WasmProcess, and could evaluate it.

The `GetValueAsData` problem is trickier, particularly as I'm not even sure the 
current implementation is correct. Are you sure that you really need the 
_current_ module there? What happens if I use "target variable" to display a 
variable from a different module? What if I then dereference that variable? If 
the dereference should happen in the context of the other module, then I guess 
the "module" should be a property of the value, not of the current execution 
context. And it sounds like some address space-y thing would help. But that 
might require teaching a lot of places in lldb about address spaces, in 
particular that a dereferencing a value in one address space, should produce a 
value in the same address space (at least for "near" pointer or something).
If we should really use the current frame as the context, then I guess we'd 
need some sort of a interfaces to ask a stack frame to get the value of a 
"Value".

> 
> 
>> And I believe the current problems are just the tip of the iceberg. I can't 
>> imagine what hoops we'll need to jump through once we start evaluating 
>> expressions...
> 
> Expression evaluation works, in my prototype, for simple expressions.  For 
> complex expressions I see logged errors like this, in 
> `IRInterpreter::CanInterpret()`:
> 
>   Unsupported instruction: %call = call float 
> @_ZNK4Vec3IfE3dotERKS0_(%class.Vec3* %7, %class.Vec3* dereferenceable(12) %8)
> 
> 
> It’s not clear to me if the problem is caused by the debug symbols or by the 
> IR generated for Wasm… is there any doc where I could learn more about 
> expression evaluation in LLDB? It’s a topic that really interests me, even 
> outside the scope of this Wasm work.

Yes, this is because complex expressions require us to inject code into the 
inferior to run it. I expect that to be quite a tough nut to crack. Even so, I 
do find it pretty impressive that simple expressions to work.

I am not aware of any good documentation for the expression evaluator. Probably 
the closest thing are some devmtg tutorials.

In D78801#2011511 <https://reviews.llvm.org/D78801#2011511>, @clayborg wrote:

> > Not only that, we might need to pass in the entire DWARF stack, in case the 
> > opcode depends on some of the stack arguments.
>
> Yes

In principle, I am fine with having a "Process" (or someone else -- we may want 
to do this for not-yet-started processes a'la "target variable" too) specifying 
the semantics of a dwarf expression. Though it that case, I think it would be 
cleaner to just have this entity provide a "DWARFEvaluator" object, which will 
handle the entirety of the evaluation. The fact that 99% of the different 
evaluators will be identical can be handled by putting that code in a common 
base class.

> I would really like to see a solution that does not include "Wasm" in any 
> process or thread virtual functions. I am fine with something like:
> 
>   enum IndexedVariableType {
>    Global,
>    Local,
>    Stack
>   };
>   
>   Process::GetIndexedVariable(IndexedVariableType type, size_t index, ...)

The thing I fear with such a solution is that it will be wasm-specific in 
everything but the name. It's very hard to create a good generic interface with 
just a few (one?) data points. There are plenty of examples for that in lldb, 
where an API tries to look very generic, but in reality it only makes sense for 
darwin/macho/objc...

> Actually, that brings up an idea: a process runtime plug-in for languages 
> that have a runtime that can be communicated with. Java has a runtime, Wasm 
> has a runtime, most Javascript engines have runtimes. Maybe this wouldn't be 
> too hard to abstract? Then we can add the GetGlobal(index), GetLocal(index), 
> GetStack(index) to this plug-in?

I think that is a very interesting idea to explore. We already have language 
runtime plugins, and they do have the ability to refine the type of a 
variable/value. Maybe they could also somehow help with computing the value of 
a variable? Then instead of a GetGlobal/Local(index), we could have 
`GetValue(Variable)`, and they would be the ones responsible for making sense 
of the dwarf expressions and address spaces?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78801/new/

https://reviews.llvm.org/D78801

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [PATCH] D78801: [LLDB] Add class ProcessWasm for WebAssembly debugging

Reply via email to