tahonermann added inline comments.

================
Comment at: clang/include/clang/Driver/Options.td:3583-3584
 
+def fexec_charset : Separate<["-"], "fexec-charset">, 
MetaVarName<"<codepage>">,
+  HelpText<"Set the execution <codepage> for string and character literals">;
 def target_cpu : Separate<["-"], "target-cpu">,
----------------
abhina.sreeskantharajan wrote:
> tahonermann wrote:
> > How about substituting "character set", "character encoding", or "charset" 
> > for "codepage"?
> > 
> > This doesn't state what names are recognized.  The ones provided by the 
> > system iconv() implementation (as is the case for gcc)?  Or all names and 
> > aliases specified by the [[ 
> > https://www.iana.org/assignments/character-sets/character-sets.xhtml | IANA 
> > character set registry ]]?
> > 
> > The set of recognized names can be a superset of the names that are 
> > actually supported.
> I've updated the description from codepage to charset.
> 
> It's hard to specify what charsets are supported because iconv library 
> differs between targets, so the list will not be the same on every platform.
Being dependent on the host iconv library seems fine by me; that is the case 
for gcc today.  I suggest making that explicit here:
  def fexec_charset : Separate<["-"], "fexec-charset">, 
MetaVarName<"<charset>">,
    HelpText<"Set the execution <charset> for string and character literals.  
Supported character encodings include XXX and those supported by the host iconv 
library.">;


================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:5969-5977
+  // Pass all -fexec-charset options to cc1.
+  std::vector<std::string> vList =
+      Args.getAllArgValues(options::OPT_fexec_charset_EQ);
+  // Set the default fexec-charset as the system charset.
+  CmdArgs.push_back("-fexec-charset");
+  CmdArgs.push_back(Args.MakeArgString(Triple.getSystemCharset()));
+  for (auto it = vList.begin(), ie = vList.end(); it != ie; ++it) {
----------------
abhina.sreeskantharajan wrote:
> tahonermann wrote:
> > I think it would be preferable to diagnose an unrecognized character 
> > encoding name here if possible.  The current changes will result in an 
> > unrecognized name (as opposed to one that is unsupported for the target) 
> > being diagnosed for each compiler instance.
> Since we do not know what charsets are supported by the iconv library on the 
> target platform, we don't know what charsets are actually invalid until we 
> try creating a CharSetConverter.
Understood, but what would be the harm in performing a lookup (constructing a 
`CharSetConverter`) here?


================
Comment at: clang/lib/Frontend/CompilerInvocation.cpp:3573
+    StringRef Value = ExecCharset->getValue();
+    Opts.ExecCharset = (std::string)Value;
+  }
----------------
abhina.sreeskantharajan wrote:
> tahonermann wrote:
> > I wouldn't expect the cast to `std::string` to be needed here.
> Without that cast, I get the following build error:
> ```
>  error: no viable overloaded '='
> ```
Ok, rather than a cast, I suggest:
  Opts.ExecCharset = Value.str();


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1322-1323
+  TranslationState = translationState;
+  if (Kind == tok::wide_string_literal)
+    TranslationState = TranslateToSystemCharset;
+  else if (isUTFLiteral(Kind))
----------------
Converting wide character literals to the system encoding doesn't seem right to 
me.  For z/OS, this should presumably convert to the wide EBCDIC encoding, but 
for all other supported platforms, the wide execution character set is either 
UTF-16 or UTF-32 depending on the size of `wchar_t` (which may be influenced by 
the `-fshort-wchar` option).


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1593-1597
+  ConversionState State = TranslationState;
+  if (Kind == tok::wide_string_literal)
+    State = TranslateToSystemCharset;
+  else if (isUTFLiteral(Kind))
+    State = NoTranslation;
----------------
The stored `TranslationState` should not be completely ignored for wide and UTF 
string literals.  The standard permits things like the following.
  #pragma rigoot L"bozit"
  #pragma rigoot u"bozit"
  _Pragma(L"rigoot bozit")
  _Pragma(u8"rigoot bozit")
For at least the `_Pragma(L"...")` case, the C++ standard [[ 
http://eel.is/c++draft/cpp.pragma.op | states ]] the `L` is ignored, but it 
doesn't say anything about other encoding prefixes.


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1594-1595
+  ConversionState State = TranslationState;
+  if (Kind == tok::wide_string_literal)
+    State = TranslateToSystemCharset;
+  else if (isUTFLiteral(Kind))
----------------
Converting wide string literals to the system encoding doesn't seem right to 
me.  For z/OS, this should presumably convert to the wide EBCDIC encoding, but 
for all other supported platforms, the wide execution character set is either 
UTF-16 or UTF-32 depending on the size of `wchar_t` (which may be influenced by 
the `-fshort-wchar` option).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93031/new/

https://reviews.llvm.org/D93031

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to