tahonermann added inline comments.
================ Comment at: clang/include/clang/Lex/LiteralSupport.h:244 bool Pascal; + ConversionState TranslationState; + ---------------- abhina.sreeskantharajan wrote: > tahonermann wrote: > > Same concern here with respect to persisting the conversion state as a data > > member. > If this member is removed in StringLiteralParser, we will need to pass the > State to multiple functions in StringLiteralParser like init(). Would this > solution be preferable to keeping a data member? I think so, yes. Data members should be used to reflect the state of the object, not as a convenient mechanism to avoid passing arguments. ================ Comment at: clang/lib/Driver/ToolChains/Clang.cpp:5982-5984 + } else { + D.Diag(clang::diag::err_drv_invalid_value) << "-fexec-charset" << *it; + } ---------------- Thank you for adding this. ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:234 + SmallString<8> ResultCharConv; + Converter->convert(std::string(1, ByteChar), ResultCharConv); + memcpy((void *)&ResultChar, ResultCharConv.data(), sizeof(unsigned)); ---------------- Conversion can fail here, particularly in the scenario corresponding to the default switch case above; `ResultChar` could contain, for example, a lead byte of a UTF-8 sequence. Something sensible should be done here; either rejecting the code with an error or substituting `?` (in the execution encoding) seems appropriate to me. ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:235 + Converter->convert(std::string(1, ByteChar), ResultCharConv); + memcpy((void *)&ResultChar, ResultCharConv.data(), sizeof(unsigned)); + } ---------------- As Richard previously noted, this `memcpy()` needs to be addressed. The intended behavior here is not clear. Are there valid scenarios in which the conversion will produce a sequence of more than one code units? I believe the input is limited to ASCII characters and invalid code units (e.g., a lead byte of a UTF-8 sequence) and in the latter case, an error and/or substitution of a `?` (in the execution encoding) seem like acceptable behaviors to me. ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:1322-1323 + TranslationState = translationState; + if (Kind == tok::wide_string_literal) + TranslationState = TranslateToSystemCharset; + else if (isUTFLiteral(Kind)) ---------------- abhina.sreeskantharajan wrote: > tahonermann wrote: > > Converting wide character literals to the system encoding doesn't seem > > right to me. For z/OS, this should presumably convert to the wide EBCDIC > > encoding, but for all other supported platforms, the wide execution > > character set is either UTF-16 or UTF-32 depending on the size of `wchar_t` > > (which may be influenced by the `-fshort-wchar` option). > Since we don't implement -fwide-exec-charset yet, what do you think should be > the default behaviour for the interim? Perhaps an Internal compiler error to indicate that appropriate support is not yet in place? ================ Comment at: clang/test/CodeGen/systemz-charset.c:16-23 +char *EscapeCharacters = "\a\b\f\n\r\t\v\\\'\"\?"; +//CHECK: c"/\16\0C\15\0D\05\0B\E0}\7Fo\00" + +char *HexCharacters = "\x12\x13\x14"; +//CHECK: c"\12\13\14\00" + +char *OctalCharacters = "\141\142\143"; ---------------- `const char*` here too please. ================ Comment at: clang/test/CodeGen/systemz-charset.cpp:1-25 +// RUN: %clang %s -std=c++17 -emit-llvm -S -target s390x-ibm-zos -o - | FileCheck %s + +const char *RawString = R"(Hello\n)"; +//CHECK: c"\C8\85\93\93\96\E0\95\00" + +char UnicodeChar8 = u8'1'; +//CHECK: i8 49 ---------------- This is good. I suggest adding escape sequences and UCNs to validate that they are not converted to IBM-1047. ================ Comment at: clang/test/Driver/clang_f_opts.c:212-213 -// RUN: %clang -### -S -fexec-charset=iso-8859-1 -o /dev/null %s 2>&1 | FileCheck -check-prefix=CHECK-INVALID-INPUT-CHARSET %s -// CHECK-INVALID-INPUT-CHARSET: error: invalid value 'iso-8859-1' in '-fexec-charset=iso-8859-1' +// RUN: %clang -### -S -fexec-charset=invalid-charset -o /dev/null %s 2>&1 | FileCheck -check-prefix=CHECK-INVALID-INPUT-CHARSET %s +// CHECK-INVALID-INPUT-CHARSET: error: invalid value 'invalid-charset' in '-fexec-charset' ---------------- This looks good. Can tests also be added to validate that the `UTF-8`, `ISO8895-1`, and `IBM-1047` option arguments are properly recognized? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D93031/new/ https://reviews.llvm.org/D93031 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits