cor3ntin marked an inline comment as done. cor3ntin added a comment. In D106215#2943653 <https://reviews.llvm.org/D106215#2943653>, @aaron.ballman wrote:
> In D106215#2943631 <https://reviews.llvm.org/D106215#2943631>, @cor3ntin > wrote: > >> In D106215#2943611 <https://reviews.llvm.org/D106215#2943611>, >> @aaron.ballman wrote: >> >>> I think that C and C++ should behave the same here; at least, I don't see >>> any reason why they should have different capabilities. >> >> I agree but as WG14 hasn't weighted in I didn't want to make that call. >> What do you think? > > My reading of C2x is that this is implementation-defined there as well. > > 6.4.4.4p13: > > A wide character constant prefixed by the letter L has type wchar_t, an > integer type defined in the > <stddef.h> header; a wide character constant prefixed by the letter u or U > has type char16_t or > char32_t, respectively, unsigned integer types defined in the <uchar.h> > header. The value of a > wide character constant containing a single multibyte character that maps to > a single member of the > extended execution character set is the wide character corresponding to that > multibyte character, > as defined by the mbtowc, mbrtoc16, or mbrtoc32 function as appropriate for > its type, with an > implementation-defined current locale. The value of a wide character constant > containing more > than one multibyte character or a single multibyte character that maps to > multiple members of > the extended execution character set, or containing a multibyte character or > escape sequence not > represented in the extended execution character set, is > implementation-defined. > > Do you agree? Yes, I agree. I think clang could make it ill-formed if it wanted to! If we want to do that we could probably remove some more code :) >>> The paper said that there is no expected code breakage from this change, >>> but have you tried building a diverse corpus of code (like a distro's worth >>> of packages) under this patch to see if anything actually breaks in >>> practice? (I don't expect breakage that isn't identifying an actual issue >>> in the code, but having some verification would be appreciated.) This would >>> also help to identify whether the change is appropriate for C as well. >> >> We have done regexes over various repositories (every vcpkg package) with no >> match. Not running a complete compiler > > Regexes are a good start but they miss the goofy (and sometimes awful) stuff > that people do with token pasting, line continuations, and other random > tricks. Would you be willing to try this as an experiment, or am I asking too > much? :-) My thinking is that if we don't see any breakage from compiling a > diverse corpus of code, we've done enough due diligence to suggest this is > safe for both C and C++, but if we see some breakage, we can either identify > that there's some valid use for this that we've not considered (less likely) > and would be informative for both WG21 and WG14, or we can identify that we > helped find bugs in real world code (more likely) which is also good feedback > for the committees. Unless there is a script to do that easily, I'm not sure I'll be able to get to it any time soon. But really, there is 0 use for these things! And you can't do much goofiness `L ## 'ab'` certainly - but that wouldn't be very useful either Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106215/new/ https://reviews.llvm.org/D106215 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits