jimingham wrote:
> On Feb 2, 2025, at 9:49 PM, cmtice ***@***.***> wrote: > > > Apart from the (mainly stylistic) inline comments, the biggest problem I see > is that the definition of an identifier is still too narrow. The restriction > on the dollar sign is completely unnecessary as C will let you put that > anywhere <https://godbolt.org/z/o7qbfeWve>. And it doesn't allow any > non-ascii characters. > > I really think this should be based on an deny- rather than an allow-list. > Any character we don't claim for ourselves should be fair game for an > identifier. If someone manages to enter the backspace character (\x7f) into > the expression, then so be it. > > The question of "identifiers" starting with digits is interesting. > Personally, I think it'd be fine to reject those (and require the > currenly-vapourware quoting syntax), because I suspect you want to accept > number suffixes, and I think it'd be confusing to explain why 123x is a valid > identifier but 123u is not, but I suspect some might have a different opinion. > > We could continue discussing that here, or we could accept everything here, > and postpone this discussion for the patch which starts parsing numbers. Up > to you.. > > To the best of my knowledge, all the languages that we want to support have > roughly the same definition of what a valid identifier is: a letter or > underscore, followed by a sequence of letters, digits and underscores, where > 'letters' are defined as 'a..z' and 'A..Z'. The one's I've been able to check > do not allow arbitrary characters in their identifiers. So that's what I > implemented (acknowledging that I currently only recognize ascii at the > moment, and fully plan to add utf8 support in the future). I added the > ability to recognize the '$' at the front specifically to allow DIL users to > ask about registers and LLDB convenience variables, which (to the best of my > knowledge) allow '$' only in the first position, and not all by itself. > > I am not sure I see that benefits of expanding what DIL recognizes as a valid > identifier beyond what the languages LLDB supports recognize? Am I missing > something? Or (this is quite possible) have I misunderstood the definition of > what's a valid identifier for some language we want to support? > > Since we definitely want to support lexing/parsing of numbers, I do not think > it's a good idea for DIL to also allow identifiers to start with numbers. > I agree here. We definitely will need to support UTF-8 characters, all the hip new languages use that character set. But allowing initial digits makes parsing sufficiently hard I don't think it likely there will be languages we need to support that do that. Can somebody even think of a language that allows this? Jim > — > Reply to this email directly, view it on GitHub > <https://github.com/llvm/llvm-project/pull/123521#issuecomment-2630033172>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ADUPVW36H3ZJSE2MRP6OPY32N37O5AVCNFSM6AAAAABVO4RH2WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZQGAZTGMJXGI>. > You are receiving this because you were mentioned. > https://github.com/llvm/llvm-project/pull/123521 _______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits