tahonermann added inline comments.

================
Comment at: clang/include/clang/Driver/Options.td:3583-3584
 
+def fexec_charset : Separate<["-"], "fexec-charset">, 
MetaVarName<"<codepage>">,
+  HelpText<"Set the execution <codepage> for string and character literals">;
 def target_cpu : Separate<["-"], "target-cpu">,
----------------
How about substituting "character set", "character encoding", or "charset" for 
"codepage"?

This doesn't state what names are recognized.  The ones provided by the system 
iconv() implementation (as is the case for gcc)?  Or all names and aliases 
specified by the [[ 
https://www.iana.org/assignments/character-sets/character-sets.xhtml | IANA 
character set registry ]]?

The set of recognized names can be a superset of the names that are actually 
supported.


================
Comment at: clang/include/clang/Lex/LiteralSupport.h:192
+                    ConversionState translationState = TranslateToExecCharset);
+  ConversionState TranslationState;
 
----------------
Does the conversion state need to be persisted as a data member?  The literal 
is consumed in the constructor.


================
Comment at: clang/include/clang/Lex/LiteralSupport.h:244
   bool Pascal;
+  ConversionState TranslationState;
+
----------------
Same concern here with respect to persisting the conversion state as a data 
member.


================
Comment at: clang/include/clang/Lex/LiteralSupport.h:246
+
+  static LiteralTranslator Translator;
 
----------------
This static data member will presumably need to be lifted to per-instance state 
as Richard mentioned elsewhere.


================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:5969-5977
+  // Pass all -fexec-charset options to cc1.
+  std::vector<std::string> vList =
+      Args.getAllArgValues(options::OPT_fexec_charset_EQ);
+  // Set the default fexec-charset as the system charset.
+  CmdArgs.push_back("-fexec-charset");
+  CmdArgs.push_back(Args.MakeArgString(Triple.getSystemCharset()));
+  for (auto it = vList.begin(), ie = vList.end(); it != ie; ++it) {
----------------
I think it would be preferable to diagnose an unrecognized character encoding 
name here if possible.  The current changes will result in an unrecognized name 
(as opposed to one that is unsupported for the target) being diagnosed for each 
compiler instance.


================
Comment at: clang/lib/Frontend/CompilerInvocation.cpp:3573
+    StringRef Value = ExecCharset->getValue();
+    Opts.ExecCharset = (std::string)Value;
+  }
----------------
I wouldn't expect the cast to `std::string` to be needed here.


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:231-239
+  if (Translate && Converter) {
+    // ResultChar is either UTF-8 or ASCII literal and can only be converted
+    // to EBCDIC on z/OS if the character can be represented in one byte.
+    if (ResultChar < 0x100) {
+      SmallString<8> ResultCharConv;
+      Converter->convert(StringRef((char *)&ResultChar), ResultCharConv);
+      void *Pointer = &ResultChar;
----------------
rsmith wrote:
> Is it correct, in general, to do character-at-a-time translation here, when 
> processing a string literal? I would expect there to be some (stateful) 
> target character sets where that's not correct.
For stateful encodings, I can imagine that state would have to be transitioned 
to the initial state before translating the escape sequence.  I suspect support 
for stateful encodings is not a goal at this time.


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:234
+    // to EBCDIC on z/OS if the character can be represented in one byte.
+    if (ResultChar < 0x100) {
+      SmallString<8> ResultCharConv;
----------------
What should happen if `ResultChar` >= 0x100?  IBM-1047 does have representation 
for other UTF-8 characters.  Regardless, it seems `ResultChar` should be 
converted to something.


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1742-1745
           EncodeUCNEscape(ThisTokBegin, ThisTokBuf, ThisTokEnd,
                           ResultPtr, hadError,
                           FullSourceLoc(StringToks[i].getLocation(), SM),
                           CharByteWidth, Diags, Features);
----------------
UCNs will require conversion here.


================
Comment at: llvm/lib/Support/Triple.cpp:1028-1029
+StringRef Triple::getSystemCharset() const {
+  if (getOS() == llvm::Triple::ZOS)
+    return "IBM-1047";
+  return "UTF-8";
----------------
No support for targeting the z/OS Enhanced ASCII run-time?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93031/new/

https://reviews.llvm.org/D93031

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to