aaron.ballman added a comment.

In D106215#2943631 <https://reviews.llvm.org/D106215#2943631>, @cor3ntin wrote:

> In D106215#2943611 <https://reviews.llvm.org/D106215#2943611>, @aaron.ballman 
> wrote:
>
>> I think that C and C++ should behave the same here; at least, I don't see 
>> any reason why they should have different capabilities.
>
> I agree but as WG14 hasn't weighted in I didn't want to make that call.
> What do you think?

My reading of C2x is that this is implementation-defined there as well.

6.4.4.4p13:

A wide character constant prefixed by the letter L has type wchar_t, an integer 
type defined in the
<stddef.h> header; a wide character constant prefixed by the letter u or U has 
type char16_t or
char32_t, respectively, unsigned integer types defined in the <uchar.h> header. 
The value of a
wide character constant containing a single multibyte character that maps to a 
single member of the
extended execution character set is the wide character corresponding to that 
multibyte character,
as defined by the mbtowc, mbrtoc16, or mbrtoc32 function as appropriate for its 
type, with an
implementation-defined current locale. The value of a wide character constant 
containing more
than one multibyte character or a single multibyte character that maps to 
multiple members of
the extended execution character set, or containing a multibyte character or 
escape sequence not
represented in the extended execution character set, is implementation-defined.

Do you agree?

>> The paper said that there is no expected code breakage from this change, but 
>> have you tried building a diverse corpus of code (like a distro's worth of 
>> packages) under this patch to see if anything actually breaks in practice? 
>> (I don't expect breakage that isn't identifying an actual issue in the code, 
>> but having some verification would be appreciated.) This would also help to 
>> identify whether the change is appropriate for C as well.
>
> We have done regexes over various repositories (every vcpkg package) with no 
> match. Not running a complete compiler

Regexes are a good start but they miss the goofy (and sometimes awful) stuff 
that people do with token pasting, line continuations, and other random tricks. 
Would you be willing to try this as an experiment, or am I asking too much? :-) 
My thinking is that if we don't see any breakage from compiling a diverse 
corpus of code, we've done enough due diligence to suggest this is safe for 
both C and C++, but if we see some breakage, we can either identify that 
there's some valid use for this that we've not considered (less likely) and 
would be informative for both WG21 and WG14, or we can identify that we helped 
find bugs in real world code (more likely) which is also good feedback for the 
committees.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106215/new/

https://reviews.llvm.org/D106215

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to