On Thu, Oct 07, 2021 at 09:12:15AM -0400, Jason Merrill wrote: > > And another thing, if HOST_CHARSET == HOST_CHARSET_EBCDIC, how does the > > libcpp/lex.c > > static const cppchar_t utf8_signifier = 0xC0; > > ... > > if (*buffer->cur >= utf8_signifier) > > { > > if (_cpp_valid_utf8 (pfile, &buffer->cur, buffer->rlimit, 1 + > > !first, > > state, &s)) > > return true; > > } > > work? Because in UTF-EBCDIC, >= 0xC0 isn't the right test for start of > > multi-byte character, it is more complicated and seems _cpp_valid_utf8 > > assumes UTF-8 as the host charset. > > Are there any supported platforms that use UTF-EBCDIC?
I have no idea. From the libcpp/charset.c code, seems there is no built-in conversion for UTF-EBCDIC, the only internally supported conversions are { "UTF-8/UTF-32LE", convert_utf8_utf32, (iconv_t)0 }, { "UTF-8/UTF-32BE", convert_utf8_utf32, (iconv_t)1 }, { "UTF-8/UTF-16LE", convert_utf8_utf16, (iconv_t)0 }, { "UTF-8/UTF-16BE", convert_utf8_utf16, (iconv_t)1 }, { "UTF-32LE/UTF-8", convert_utf32_utf8, (iconv_t)0 }, { "UTF-32BE/UTF-8", convert_utf32_utf8, (iconv_t)1 }, { "UTF-16LE/UTF-8", convert_utf16_utf8, (iconv_t)0 }, { "UTF-16BE/UTF-8", convert_utf16_utf8, (iconv_t)1 }, and identity, so unless the C library iconv supports conversion to UTF-EBCDIC, the only case that could be supported is when -finput-charset= is also UTF-EBCDIC. E.g. glibc iconv doesn't support that. Never used z/VM nor OS/390 which I think are the only possible hosts that could have UTF-EBCDIC. CCing Andreas if he knows more... > > --- gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C.jj 2021-10-07 > > 14:34:35.182132411 +0200 > > +++ gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C 2021-10-07 > > 14:34:02.902583774 +0200 > > @@ -0,0 +1,33 @@ > > +// PR c++/102615 - P2316R2 - Consistent character literal encoding > > +// { dg-do compile } > > Doesn't this need to run? OK with that change. Thanks for catching that, fixed, retested and committed. Jakub