tahonermann added a comment. Hi, Abhina. Sorry for the delay getting back to you. I added some more comments.
================ Comment at: clang/include/clang/Lex/LiteralSupport.h:191 + Preprocessor &PP, tok::TokenKind kind, + ConversionState TranslationState = TranslateToExecCharset); ---------------- Is the default argument for `TranslationState` actually used anywhere? I'm skeptical that a default argument provides a benefit here. Actually, this diff doesn't include any changes to construct a `CharLiteralParser` with an explicit argument. It seems this argument isn't actually needed. The only places I see objects of `CharLiteralParser` type constructed are in `EvaluateValue()` in `clang/lib/Lex/PPExpressions.cpp` and `Sema::ActOnCharacterConstant()` in `clang/lib/Sema/SemaExpr.cpp`. ================ Comment at: clang/include/clang/Lex/LiteralSupport.h:238-239 + ResultPtr(ResultBuf.data()), hadError(false), Pascal(false) { + LT = new LiteralTranslator(); + LT->setTranslationTables(Features, Target, *Diags); + init(StringToks, TranslationState); ---------------- I don't think a `LiteralTranslator` object is actually needed in this case. The only use of this constructor that I see is in `ModuleMapParser::consumeToken()` in `clang/lib/Lex/ModuleMap.cpp` and, in that case, I don't think any translation is necessary. This suggests that `TranslationState` is not needed for this constructor either; `NoTranslation` can be passed to `init()`. ================ Comment at: clang/include/clang/Lex/LiteralSupport.h:260-262 + unsigned getOffsetOfStringByte( + const Token &TheTok, unsigned ByteNo, + ConversionState TranslationState = TranslateToExecCharset) const; ---------------- I don't think `getOffsetOfStringByte()` should require a `ConversionState` parameter. If I understand it correctly, this function should be operating on the string in the internal encoding, never in a converted encoding. ================ Comment at: clang/include/clang/Lex/LiteralTranslator.h:19-23 +enum ConversionState { + NoTranslation, + TranslateToSystemCharset, + TranslateToExecCharset +}; ---------------- Some naming suggestions... The enumeration is not used to record a state, but rather to indicate an action to take. Also, use of both "conversion" and "translation" could be confusing, so I suggest sticking with one. Perhaps: enum class LiteralConversion { None, ToSystemCharset, ToExecCharset }; ================ Comment at: clang/include/clang/Lex/LiteralTranslator.h:30-31 + +class LiteralTranslator { +public: + llvm::StringRef InternalCharset; ---------------- I don't know the LLVM style guides well, but I suspect a class with all public members should be defined using `struct` and not include access specifiers. ================ Comment at: clang/include/clang/Lex/LiteralTranslator.h:35 + llvm::StringRef ExecCharset; + llvm::StringMap<llvm::CharSetConverter> ExecCharsetTables; + ---------------- Given the converter setters and accessors below, `ExecCharsetTables` should be a private member. ================ Comment at: clang/include/clang/Lex/LiteralTranslator.h:37 + + llvm::CharSetConverter *getConversionTable(const char *Codepage); + CharsetTableStatusCode findOrCreateExecCharsetTable(const char *To); ---------------- `getConversionTable()` is logically `const`. Perhaps `ExecCharsetTables` should be `mutable`. From a terminology stand point, this function is misnamed. It doesn't return a table, it returns a converter for an encoding. I suggest: llvm::CharSetConverter *getCharSetConverter(const char *Encoding) const; ================ Comment at: clang/include/clang/Lex/LiteralTranslator.h:38 + llvm::CharSetConverter *getConversionTable(const char *Codepage); + CharsetTableStatusCode findOrCreateExecCharsetTable(const char *To); + llvm::CharSetConverter * ---------------- `findOrCreateExecCharsetTable()` seems oddly named since it doesn't return whatever it finds or creates. It seems like this function would be more useful if it returned a `llvm::CharSetConverter` pointer with `nullptr` indicating lookup/creation failed. This function seems like it should be an implementation detail of the class, not a public interface. ================ Comment at: clang/include/clang/Lex/LiteralTranslator.h:41-43 + void setTranslationTables(const clang::LangOptions &Opts, + const clang::TargetInfo &TInfo, + clang::DiagnosticsEngine &Diags); ---------------- `setTranslationTables()` is awkward. It is effectively operating as a constructor for the class, but isn't called at object construction and it does work that goes beyond initialization. ================ Comment at: clang/include/clang/Lex/LiteralTranslator.h:44 + clang::DiagnosticsEngine &Diags); +}; + ---------------- I suggest trying a design more like this: class LiteralTranslator { std::string SystemEncoding; std::string ExecutionEncoding; public: LiteralTranslator(llvm::StringRef SystemEncoding, llvm::StringRef ExecutionEncoding); // Retrieve the name for the system encoding. llvm::StringRef getSystemEncoding() const; // Retrieve the name for the execution encoding. llvm::StringRef getExecutionEncoding() const; // Retrieve a converter for converting from the internal encoding (UTF-8) // to the system encoding. llvm::CharSetConverter* getSystemEncodingConverter() const; // Retrieve a converter for converting from the internal encoding (UTF-8) // to the execution encoding. llvm::CharSetConverter* getExecutionEncodingConverter() const; }; LiteralTranslator createLiteralTranslatorFromOptions(const clang::LangOptions &Opts, const clang::TargetInfo &TInfo, clang::DiagnosticsEngine &Diags); ================ Comment at: clang/include/clang/Lex/Preprocessor.h:145 ModuleLoader &TheModuleLoader; + LiteralTranslator *LT = nullptr; ---------------- I don't see a reason for `LT` to be a pointer. Can it be made a reference or, better, a non-reference, non-pointer data member? ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:1322-1324 + ConversionState State = isUTFLiteral(Kind) ? NoTranslation : TranslationState; + llvm::CharSetConverter *Converter = + LT ? LT->getCharConversionTable(State) : nullptr; ---------------- Per the comment associated with the constructor declaration, I don't think the new constructor parameter is needed; translation to execution character set is always desired for non-UTF character literals. I think this can be something like: llvm::CharSetConverter *Converter = nullptr; if (! isUTFLiteral(Kind)) { assert(LT); Converter = LT->getCharConversionTable(TranslateToExecCharset); } ================ Comment at: clang/lib/Lex/LiteralSupport.cpp:1593-1597 + ConversionState State = TranslationState; + if (Kind == tok::wide_string_literal) + State = TranslateToSystemCharset; + else if (isUTFLiteral(Kind)) + State = NoTranslation; ---------------- abhina.sreeskantharajan wrote: > tahonermann wrote: > > The stored `TranslationState` should not be completely ignored for wide and > > UTF string literals. The standard permits things like the following. > > #pragma rigoot L"bozit" > > #pragma rigoot u"bozit" > > _Pragma(L"rigoot bozit") > > _Pragma(u8"rigoot bozit") > > For at least the `_Pragma(L"...")` case, the C++ standard [[ > > http://eel.is/c++draft/cpp.pragma.op | states ]] the `L` is ignored, but it > > doesn't say anything about other encoding prefixes. > Please correct me if I'm wrong, these Pragma strings are not parsed through > StringLiteralParser, they are parsed in **clang/lib/Lex/Pragma.cpp** in this > function. > **void Preprocessor::Handle_Pragma(Token &Tok)** > So if they require translation, it would need to be done in that function. > Ah, ok, good. There are other cases where a string literal is not used to produce a string literal object. See https://wg21.link/p2314 for a table. You may want to audit for those cases. ================ Comment at: clang/lib/Lex/Preprocessor.cpp:88 ScratchBuf(new ScratchBuffer(SourceMgr)), HeaderInfo(Headers), - TheModuleLoader(TheModuleLoader), ExternalSource(nullptr), + TheModuleLoader(TheModuleLoader), LT(new LiteralTranslator()), + ExternalSource(nullptr), ---------------- Per comments elsewhere, please try to make `LT` a non-pointer non-reference data member. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D93031/new/ https://reviews.llvm.org/D93031 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits