[Lldb-commits] [lldb] [LLDB] Add Lexer (with tests) for DIL (Data Inspection Language). (PR #123521)

Pavel Labath via lldb-commits Mon, 03 Feb 2025 04:55:27 -0800

labath wrote:

Thanks for the summary Andy.


> I went through the thread to understand the current consensus and I see 
> various nice ideas flying around. Let me try to summarize the requirements 
> and the current status quo.
> 
>   * DIL should support synthetic children, whose names can be arbitrary in 
> general case, even `a+b*c`
> 
>    * A common synthetic name is `[0]`, which is used for children of 
> vectors/maps and people want to write `print vec[0]`
>       
>      * `frame variable` supports this by having special support for `[]` 
> "expressions"
> 
>    * We want DIL to be easy & convenient to use in most (simple) cases, but 
> also to be able to support complicated cases and it doesn't have to be 
> _super_ convenient for those

SGTM

> Possible behaviour for DIL:
> 
>   * Make the definition of `identifier` in Lexer to roughly match C or similar
> 
>   * Introduce escaping of identifiers, e.g. with backticks
>       
>     * The expression `` foo->`a*b.c`+1 `` is parsed as approx 
> `foo.GetChildWithName("a*b.c") + 1`

I already spoke at length about identifier names. Quoting of fancy names SGTM. 
I don't think its relevant for this patch (lexing), but since you're also 
mentioning the wider syntax of the language, I want to mention that there's 
also another kind of disambiguation to consider. Forcing a string to be treated 
as an identifier is one thing. Another question is forcing an identifier to be 
treated as a specific kind of entity. For example, if there's a variable and a 
type with the same name, can we say which one we mean? Or can we say we want to 
see the value of a global variable `foo` in a specific compile unit? Or in an 
outer scope that's shadowed by another variable?

We don't exactly support that right now, but e.g. `target variable foo` will 
print *all* global variables with that names. That gets trickier with a more 
complicated parser, because how do you print the result of `global1+global2` if 
there are multiple candidates for each name

> 
>     * Add special support for `[]` in the parser
>       
>       * The expression `` foo.`[1]`  `` is parsed as 
> `foo.GetChildWithName("[1]")`
>       * The expression `foo[1]` tries the following:
>         
>         * `foo.GetChildWithName("1")`
>         * `foo.GetChildWithName("[1]")`
>         * `foo.GetChildAtIndex(1)`

I'd go with just the second option because I'd like to avoid ambiguities and 
fallbacks. I think that's more or less the status quo, at least for the types 
with child providers. We'd need some special handling for C pointers and 
arrays, but that's also what we already have

>       * The expression `foo["bar"]` tries:
>         
>         * `foo.GetChildWithName("bar")`
>         * `foo.GetChildWithName("[bar]")`
>       * The expression `foo[<expr>]` -- `expr` is evaluated and treated as 
> cases above. If the result of `expr` is not a number or a string -- produce 
> an error

I think this is an interesting idea whose usefulness mainly depends on what 
kind of string operations we have in the language. If there's no way to 
construct strings, then I don't think it's very useful. You could write things 
like `some_variable[another_variable]`, but I don't think that's going to be 
useful because for static types (e.g. C structs) you're unlikely to have a 
variable with a value that contains the name of a field, and for types with 
synthetic children the names of the children are not going to have any relation 
to the way it's seen by the code (map<string, string> is still going to have a 
child called `[0]`).

Overall I'd leave this out for the time being because it doesn't impact parsing 
or lexing, just how some (currently invalid) syntax trees can be interpreted.

> For cases where the `GetChildWithName( "1" / "[1]" )` and 
> `GetChildAtIndex(1)` produce different valid results we could shows a 
> warning/error to make it clear to the user that there's some ambiguity. IMO 
> this would be an improvement over the current situation where `print` and 
> `expr` simply produce different results.

I'm not sure how this is going to help. I assume you're referring to the 
`map<int, int>` scenario. In this case, the map object is not going to have a 
child called `"1"`, even if it happens to contain a key with the value `1`. 
(Depending on the other keys, the name of the child containing it could be 
`"[0]"`, `"[1]"`, or `"[47]"`). Or are you proposing to change that?

https://github.com/llvm/llvm-project/pull/123521
_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [lldb] [LLDB] Add Lexer (with tests) for DIL (Data Inspection Language). (PR #123521)

Reply via email to