[Lldb-commits] [lldb] [LLDB] Add Lexer (with tests) for DIL (Data Inspection Language). (PR #123521)

via lldb-commits Mon, 03 Feb 2025 10:02:22 -0800

jimingham wrote:



> On Feb 2, 2025, at 9:49 PM, cmtice ***@***.***> wrote:
> 
> 
> Apart from the (mainly stylistic) inline comments, the biggest problem I see 
> is that the definition of an identifier is still too narrow. The restriction 
> on the dollar sign is completely unnecessary as C will let you put that 
> anywhere <https://godbolt.org/z/o7qbfeWve>. And it doesn't allow any 
> non-ascii characters.
> 
> I really think this should be based on an deny- rather than an allow-list. 
> Any character we don't claim for ourselves should be fair game for an 
> identifier. If someone manages to enter the backspace character (\x7f) into 
> the expression, then so be it.
> 
> The question of "identifiers" starting with digits is interesting. 
> Personally, I think it'd be fine to reject those (and require the 
> currenly-vapourware quoting syntax), because I suspect you want to accept 
> number suffixes, and I think it'd be confusing to explain why 123x is a valid 
> identifier but 123u is not, but I suspect some might have a different opinion.
> 
> We could continue discussing that here, or we could accept everything here, 
> and postpone this discussion for the patch which starts parsing numbers. Up 
> to you..
> 
> To the best of my knowledge, all the languages that we want to support have 
> roughly the same definition of what a valid identifier is: a letter or 
> underscore, followed by a sequence of letters, digits and underscores, where 
> 'letters' are defined as 'a..z' and 'A..Z'. The one's I've been able to check 
> do not allow arbitrary characters in their identifiers. So that's what I 
> implemented (acknowledging that I currently only recognize ascii at the 
> moment, and fully plan to add utf8 support in the future). I added the 
> ability to recognize the '$' at the front specifically to allow DIL users to 
> ask about registers and LLDB convenience variables, which (to the best of my 
> knowledge) allow '$' only in the first position, and not all by itself.
> 

> I am not sure I see that benefits of expanding what DIL recognizes as a valid 
> identifier beyond what the languages LLDB supports recognize? Am I missing 
> something? Or (this is quite possible) have I misunderstood the definition of 
> what's a valid identifier for some language we want to support?
> 
> Since we definitely want to support lexing/parsing of numbers, I do not think 
> it's a good idea for DIL to also allow identifiers to start with numbers.
> 

I agree here.  We definitely will need to support UTF-8 characters, all the hip 
new languages use that character set.  But allowing initial digits makes 
parsing sufficiently hard I don't think it likely there will be languages we 
need to support that do that.  Can somebody even think of a language that 
allows this?

Jim

> —
> Reply to this email directly, view it on GitHub 
> <https://github.com/llvm/llvm-project/pull/123521#issuecomment-2630033172>, 
> or unsubscribe 
> <https://github.com/notifications/unsubscribe-auth/ADUPVW36H3ZJSE2MRP6OPY32N37O5AVCNFSM6AAAAABVO4RH2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZQGAZTGMJXGI>.
> You are receiving this because you were mentioned.
> 



https://github.com/llvm/llvm-project/pull/123521
_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [lldb] [LLDB] Add Lexer (with tests) for DIL (Data Inspection Language). (PR #123521)

Reply via email to