tahonermann added inline comments.
================ Comment at: clang/include/clang/Driver/Options.td:3583-3584 +def fexec_charset : Separate<["-"], "fexec-charset">, MetaVarName<"<codepage>">, + HelpText<"Set the execution <codepage> for string and character literals">; def target_cpu : Separate<["-"], "target-cpu">, ---------------- abhina.sreeskantharajan wrote: > tahonermann wrote: > > How about substituting "character set", "character encoding", or "charset" > > for "codepage"? > > > > This doesn't state what names are recognized. The ones provided by the > > system iconv() implementation (as is the case for gcc)? Or all names and > > aliases specified by the [[ > > https://www.iana.org/assignments/character-sets/character-sets.xhtml | IANA > > character set registry ]]? > > > > The set of recognized names can be a superset of the names that are > > actually supported. > I've updated the description from codepage to charset. > > It's hard to specify what charsets are supported because iconv library > differs between targets, so the list will not be the same on every platform. Being dependent on the host iconv library seems fine by me; that is the case for gcc today. I suggest making that explicit here: def fexec_charset : Separate<["-"], "fexec-charset">, MetaVarName<"<charset>">, HelpText<"Set the execution <charset> for string and character literals. Supported character encodings include XXX and those supported by the host iconv library.">; ================ Comment at: clang/lib/Driver/ToolChains/Clang.cpp:5969-5977 + // Pass all -fexec-charset options to cc1. + std::vector<std::string> vList = + Args.getAllArgValues(options::OPT_fexec_charset_EQ); + // Set the default fexec-charset as the system charset. + CmdArgs.push_back("-fexec-charset"); + CmdArgs.push_back(Args.MakeArgString(Triple.getSystemCharset())); + for (auto it = vList.begin(), ie = vList.end(); it != ie; ++it) { ---------------- abhina.sreeskantharajan wrote: > tahonermann wrote: > > I think it would be preferable to diagnose an unrecognized character > > encoding name here if possible. The current changes will result in an > > unrecognized name (as opposed to one that is unsupported for the target) > > being diagnosed for each compiler instance. > Since we do not know what charsets are supported by the iconv library on the > target platform, we don't know what charsets are actually invalid until we > try creating a CharSetConverter. Understood, but what would be the harm in performing a lookup (constructing a `CharSetConverter`) here? ================ Comment at: clang/lib/Frontend/CompilerInvocation.cpp:3573 + StringRef Value = ExecCharset->getValue(); + Opts.ExecCharset = (std::string)Value; + } ---------------- abhina.sreeskantharajan wrote: > tahonermann wrote: > > I wouldn't expect the cast to `std::string` to be needed here. > Without that cast, I get the following build error: > ``` > error: no viable overloaded '=' > ``` Ok, rather than a cast, I suggest: Opts.ExecCharset = Value.str(); ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:1322-1323 + TranslationState = translationState; + if (Kind == tok::wide_string_literal) + TranslationState = TranslateToSystemCharset; + else if (isUTFLiteral(Kind)) ---------------- Converting wide character literals to the system encoding doesn't seem right to me. For z/OS, this should presumably convert to the wide EBCDIC encoding, but for all other supported platforms, the wide execution character set is either UTF-16 or UTF-32 depending on the size of `wchar_t` (which may be influenced by the `-fshort-wchar` option). ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:1593-1597 + ConversionState State = TranslationState; + if (Kind == tok::wide_string_literal) + State = TranslateToSystemCharset; + else if (isUTFLiteral(Kind)) + State = NoTranslation; ---------------- The stored `TranslationState` should not be completely ignored for wide and UTF string literals. The standard permits things like the following. #pragma rigoot L"bozit" #pragma rigoot u"bozit" _Pragma(L"rigoot bozit") _Pragma(u8"rigoot bozit") For at least the `_Pragma(L"...")` case, the C++ standard [[ http://eel.is/c++draft/cpp.pragma.op | states ]] the `L` is ignored, but it doesn't say anything about other encoding prefixes. ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:1594-1595 + ConversionState State = TranslationState; + if (Kind == tok::wide_string_literal) + State = TranslateToSystemCharset; + else if (isUTFLiteral(Kind)) ---------------- Converting wide string literals to the system encoding doesn't seem right to me. For z/OS, this should presumably convert to the wide EBCDIC encoding, but for all other supported platforms, the wide execution character set is either UTF-16 or UTF-32 depending on the size of `wchar_t` (which may be influenced by the `-fshort-wchar` option). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D93031/new/ https://reviews.llvm.org/D93031 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits