tahonermann added a comment.

Hi, Abhina.  Sorry for the delay getting back to you.  I added some more 
comments.



================
Comment at: clang/include/clang/Lex/LiteralSupport.h:191
+                    Preprocessor &PP, tok::TokenKind kind,
+                    ConversionState TranslationState = TranslateToExecCharset);
 
----------------
Is the default argument for `TranslationState` actually used anywhere?  I'm 
skeptical that a default argument provides a benefit here.

Actually, this diff doesn't include any changes to construct a 
`CharLiteralParser` with an explicit argument.  It seems this argument isn't 
actually needed.

The only places I see objects of `CharLiteralParser` type constructed are in 
`EvaluateValue()` in `clang/lib/Lex/PPExpressions.cpp` and 
`Sema::ActOnCharacterConstant()` in `clang/lib/Sema/SemaExpr.cpp`.


================
Comment at: clang/include/clang/Lex/LiteralSupport.h:238-239
+        ResultPtr(ResultBuf.data()), hadError(false), Pascal(false) {
+    LT = new LiteralTranslator();
+    LT->setTranslationTables(Features, Target, *Diags);
+    init(StringToks, TranslationState);
----------------
I don't think a `LiteralTranslator` object is actually needed in this case.  
The only use of this constructor that I see is in 
`ModuleMapParser::consumeToken()` in `clang/lib/Lex/ModuleMap.cpp` and, in that 
case, I don't think any translation is necessary.

This suggests that `TranslationState` is not needed for this constructor 
either; `NoTranslation` can be passed to `init()`.


================
Comment at: clang/include/clang/Lex/LiteralSupport.h:260-262
+  unsigned getOffsetOfStringByte(
+      const Token &TheTok, unsigned ByteNo,
+      ConversionState TranslationState = TranslateToExecCharset) const;
----------------
I don't think `getOffsetOfStringByte()` should require a `ConversionState` 
parameter.  If I understand it correctly, this function should be operating on 
the string in the internal encoding, never in a converted encoding.


================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:19-23
+enum ConversionState {
+  NoTranslation,
+  TranslateToSystemCharset,
+  TranslateToExecCharset
+};
----------------
Some naming suggestions...  The enumeration is not used to record a state, but 
rather to indicate an action to take.  Also, use of both "conversion" and 
"translation" could be confusing, so I suggest sticking with one.  Perhaps:
  enum class LiteralConversion {
    None,
    ToSystemCharset,
    ToExecCharset
  };


================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:30-31
+
+class LiteralTranslator {
+public:
+  llvm::StringRef InternalCharset;
----------------
I don't know the LLVM style guides well, but I suspect a class with all public 
members should be defined using `struct` and not include access specifiers.


================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:35
+  llvm::StringRef ExecCharset;
+  llvm::StringMap<llvm::CharSetConverter> ExecCharsetTables;
+
----------------
Given the converter setters and accessors below, `ExecCharsetTables` should be 
a private member.


================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:37
+
+  llvm::CharSetConverter *getConversionTable(const char *Codepage);
+  CharsetTableStatusCode findOrCreateExecCharsetTable(const char *To);
----------------
`getConversionTable()` is logically `const`.  Perhaps `ExecCharsetTables` 
should be `mutable`.

From a terminology stand point, this function is misnamed.  It doesn't return a 
table, it returns a converter for an encoding.  I suggest:
  llvm::CharSetConverter *getCharSetConverter(const char *Encoding) const;


================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:38
+  llvm::CharSetConverter *getConversionTable(const char *Codepage);
+  CharsetTableStatusCode findOrCreateExecCharsetTable(const char *To);
+  llvm::CharSetConverter *
----------------
`findOrCreateExecCharsetTable()` seems oddly named since it doesn't return 
whatever it finds or creates.  It seems like this function would be more useful 
if it returned a `llvm::CharSetConverter` pointer with `nullptr` indicating 
lookup/creation failed.

This function seems like it should be an implementation detail of the class, 
not a public interface.


================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:41-43
+  void setTranslationTables(const clang::LangOptions &Opts,
+                            const clang::TargetInfo &TInfo,
+                            clang::DiagnosticsEngine &Diags);
----------------
`setTranslationTables()` is awkward.  It is effectively operating as a 
constructor for the class, but isn't called at object construction and it does 
work that goes beyond initialization.


================
Comment at: clang/include/clang/Lex/LiteralTranslator.h:44
+                            clang::DiagnosticsEngine &Diags);
+};
+
----------------
I suggest trying a design more like this:
  class LiteralTranslator {
    std::string SystemEncoding;
    std::string ExecutionEncoding;
  public:
    LiteralTranslator(llvm::StringRef SystemEncoding,
                      llvm::StringRef ExecutionEncoding);

    // Retrieve the name for the system encoding.
    llvm::StringRef getSystemEncoding() const;
    // Retrieve the name for the execution encoding.
    llvm::StringRef getExecutionEncoding() const;

    // Retrieve a converter for converting from the internal encoding (UTF-8)
    // to the system encoding.
    llvm::CharSetConverter* getSystemEncodingConverter() const;
    // Retrieve a converter for converting from the internal encoding (UTF-8)
    // to the execution encoding.
    llvm::CharSetConverter* getExecutionEncodingConverter() const;
  };
  
  LiteralTranslator createLiteralTranslatorFromOptions(const clang::LangOptions 
&Opts,
                                                       const clang::TargetInfo 
&TInfo,
                                                       clang::DiagnosticsEngine 
&Diags);


================
Comment at: clang/include/clang/Lex/Preprocessor.h:145
   ModuleLoader      &TheModuleLoader;
+  LiteralTranslator *LT = nullptr;
 
----------------
I don't see a reason for `LT` to be a pointer.  Can it be made a reference or, 
better, a non-reference, non-pointer data member?


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1322-1324
+  ConversionState State = isUTFLiteral(Kind) ? NoTranslation : 
TranslationState;
+  llvm::CharSetConverter *Converter =
+      LT ? LT->getCharConversionTable(State) : nullptr;
----------------
Per the comment associated with the constructor declaration, I don't think the 
new constructor parameter is needed; translation to execution character set is 
always desired for non-UTF character literals.  I think this can be something 
like:
  llvm::CharSetConverter *Converter = nullptr;
  if (! isUTFLiteral(Kind)) {
    assert(LT);
    Converter = LT->getCharConversionTable(TranslateToExecCharset);
  }


================
Comment at: clang/lib/Lex/LiteralSupport.cpp:1593-1597
+  ConversionState State = TranslationState;
+  if (Kind == tok::wide_string_literal)
+    State = TranslateToSystemCharset;
+  else if (isUTFLiteral(Kind))
+    State = NoTranslation;
----------------
abhina.sreeskantharajan wrote:
> tahonermann wrote:
> > The stored `TranslationState` should not be completely ignored for wide and 
> > UTF string literals.  The standard permits things like the following.
> >   #pragma rigoot L"bozit"
> >   #pragma rigoot u"bozit"
> >   _Pragma(L"rigoot bozit")
> >   _Pragma(u8"rigoot bozit")
> > For at least the `_Pragma(L"...")` case, the C++ standard [[ 
> > http://eel.is/c++draft/cpp.pragma.op | states ]] the `L` is ignored, but it 
> > doesn't say anything about other encoding prefixes.
> Please correct me if I'm wrong, these Pragma strings are not parsed through 
> StringLiteralParser, they are parsed in **clang/lib/Lex/Pragma.cpp** in this 
> function.
> **void Preprocessor::Handle_Pragma(Token &Tok)**
> So if they require translation, it would need to be done in that function.
> 
Ah, ok, good.  There are other cases where a string literal is not used to 
produce a string literal object.  See https://wg21.link/p2314 for a table.  You 
may want to audit for those cases.


================
Comment at: clang/lib/Lex/Preprocessor.cpp:88
       ScratchBuf(new ScratchBuffer(SourceMgr)), HeaderInfo(Headers),
-      TheModuleLoader(TheModuleLoader), ExternalSource(nullptr),
+      TheModuleLoader(TheModuleLoader), LT(new LiteralTranslator()),
+      ExternalSource(nullptr),
----------------
Per comments elsewhere, please try to make `LT` a non-pointer non-reference 
data member.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93031/new/

https://reviews.llvm.org/D93031

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to