abhina.sreeskantharajan marked 8 inline comments as done. abhina.sreeskantharajan added inline comments.
================ Comment at: clang/include/clang/Lex/LiteralSupport.h:244 bool Pascal; + ConversionState TranslationState; + ---------------- tahonermann wrote: > abhina.sreeskantharajan wrote: > > tahonermann wrote: > > > Same concern here with respect to persisting the conversion state as a > > > data member. > > If this member is removed in StringLiteralParser, we will need to pass the > > State to multiple functions in StringLiteralParser like init(). Would this > > solution be preferable to keeping a data member? > I think so, yes. Data members should be used to reflect the state of the > object, not as a convenient mechanism to avoid passing arguments. Thanks, I've removed this member. ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:1322-1323 + TranslationState = translationState; + if (Kind == tok::wide_string_literal) + TranslationState = TranslateToSystemCharset; + else if (isUTFLiteral(Kind)) ---------------- tahonermann wrote: > abhina.sreeskantharajan wrote: > > tahonermann wrote: > > > Converting wide character literals to the system encoding doesn't seem > > > right to me. For z/OS, this should presumably convert to the wide EBCDIC > > > encoding, but for all other supported platforms, the wide execution > > > character set is either UTF-16 or UTF-32 depending on the size of > > > `wchar_t` (which may be influenced by the `-fshort-wchar` option). > > Since we don't implement -fwide-exec-charset yet, what do you think should > > be the default behaviour for the interim? > Perhaps an Internal compiler error to indicate that appropriate support is > not yet in place? Thanks for the suggestion. I've added assertions for wide character translation before we do any translation. ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:1594-1595 + ConversionState State = TranslationState; + if (Kind == tok::wide_string_literal) + State = TranslateToSystemCharset; + else if (isUTFLiteral(Kind)) ---------------- tahonermann wrote: > Converting wide string literals to the system encoding doesn't seem right to > me. For z/OS, this should presumably convert to the wide EBCDIC encoding, > but for all other supported platforms, the wide execution character set is > either UTF-16 or UTF-32 depending on the size of `wchar_t` (which may be > influenced by the `-fshort-wchar` option). I've now added an assertion when translating wide characters. ================ Comment at: clang/test/CodeGen/systemz-charset.cpp:1-25 +// RUN: %clang %s -std=c++17 -emit-llvm -S -target s390x-ibm-zos -o - | FileCheck %s + +const char *RawString = R"(Hello\n)"; +//CHECK: c"\C8\85\93\93\96\E0\95\00" + +char UnicodeChar8 = u8'1'; +//CHECK: i8 49 ---------------- tahonermann wrote: > This is good. I suggest adding escape sequences and UCNs to validate that > they are not converted to IBM-1047. Good idea, I added those testcases as per your suggestion. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D93031/new/ https://reviews.llvm.org/D93031 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits