Hi, On Thu, 2023-03-16 at 10:28 +0100, Thomas Schwinge wrote: > I'm now also putting Mark Wielaard in CC; he once also started discussing > this topic, "thinking of importing a couple of gnulib modules to help > with UTF-8 processing [unless] other gcc frontends handle [these things] > already in a way that might be reusable". See the thread starting at > <https://inbox.sourceware.org/gcc/ypqrmbhyu3wrp...@wildebeest.org> > "rust frontend and UTF-8/unicode processing/properties".
Thanks. BTW. I am not currently working on this. Note the responses in the above thread by Ian and Jason who pointed out that some of the requirements of the gccrs frontend might be covered in the go frontend and libcpp, but not really in a reusable way. One other thing you might want to coordinate on is NFC normalization and Confusable Detection for identifiers. https://unicode.org/reports/tr39/#Confusable_Detection There has been some work on this by David Malcolm and Marek Polacek https://developers.redhat.com/articles/2022/01/12/prevent-trojan-source-attacks-gcc-12 But that is on a slightly higher source level (not specific to identifiers). You might want to research whether NFC normalization of identifiers is required to be done by the lexer or parser in Rust and how it interacts with proc macros. Cheers, Mark