labath wrote: > To the best of my knowledge, all the languages that we want to support have > roughly the same definition of what a valid identifier is: a letter or > underscore, followed by a sequence of letters, digits and underscores, where > 'letters' are defined as 'a..z' and 'A..Z'. The one's I've been able to check > do not allow arbitrary characters in their identifiers. So that's what I > implemented (acknowledging that I currently only recognize ascii at the > moment, and fully plan to add utf8 support in the future). I added the > ability to recognize the '$' at the front specifically to allow DIL users to > ask about registers and LLDB convenience variables, which (to the best of my > knowledge) allow '$' only in the first position, and not all by itself.
I don't know how you were checking that, but I'm certain that's not the case. I [already gave you](https://godbolt.org/z/o7qbfeWve) an example of C code which contradicts all of these (note that there can be difference between what's considered a valid identifier by the specification of a language, and what an actual compiler for that language will accept). And I'm not even mentioning all of the names that can be constructed by synthetic child providers. You say you want to add utf-8 support. How do you intend to do that? Do you want to enumerate all of the supported characters in each language? Check which language supports variable names in Klingon? Some of the rules can be really obscure. For example, Java accepts £ (`\xA3`) as a variable name, but not © (`\xa9`). I'm sure they had some reason to choose that, but I'd rather not have to find that out. OTOH, if you just accept all of the high-bit ascii values as valid characters, then you can support utf8 with a single line of code. And you're not regressing anything because that's exactly what the current implementation does. I don't think this list has to be set in stone. For example, `frame variable` currently accepts `@` as a variable name. I believe you don't have any plans for that operator, so I'd just stick to that. If we can come up with some fancy use for it (maybe as an escape character?), then I'm certainly open to changing its classification. > I am not sure I see that benefits of expanding what DIL recognizes as a valid > identifier beyond what the languages LLDB supports recognize? For me the main benefits are: - simplicity of the implementation - being able to express a wide range of variable names, even for languages we don't support right now - matching status quo That said, I would like to hear what you think are the benefits of *not* recognizing wider set of identifier values. And I'm not talking about names like `123foo` (it sounds like there's consensus to ban those). I'm thinking more of names like `$`, `foo$`, `💩`, etc. https://github.com/llvm/llvm-project/pull/123521 _______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits