tahonermann added inline comments.
================ Comment at: clang/lib/AST/Interp/ByteCodeExprGen.cpp:445-446 + const CharacterLiteral *E) { + if (Optional<PrimType> T = classify(E->getType())) + return this->emitConst(E, E->getValue()); + ---------------- `T` isn't used and `classify()` already returns an `Optional<PrimType>`, so no conversion coercion appear to be intended here. I think this can be simplified to: if (classify(E)) this->emitConst(E, E->getValue()); ================ Comment at: clang/test/AST/Interp/arrays.cpp:138 + constexpr char32_t c = U'\U0001F60E'; + static_assert(c == U'😎'); + ---------------- This check depends on the source file being UTF-8 encoded. Perhaps compare against `0x0001F60EL` instead. ================ Comment at: clang/test/AST/Interp/arrays.cpp:143 + +}; ---------------- As others already noted, additional testing of multicharacter literals and UCNs (including named universal characters like `\N{LATIN_CAPITAL_LETTER_E}` would be beneficial. Some tests of character escapes like `\t` wouldn't hurt either. Clang does not yet support use of `-fexec-charset` to set the literal encoding (execution character set) to anything other than UTF-8 though work on that has been done (see D93031). If such work was completed, it would be useful to run some tests against a non-UTF-8 encoding. Maybe next year. ================ Comment at: clang/test/AST/Interp/arrays.cpp:135 + static_assert(u32[1] == U'b', ""); +}; + ---------------- cor3ntin wrote: > aaron.ballman wrote: > > shafik wrote: > > > tbaeder wrote: > > > > aaron.ballman wrote: > > > > > I think you need a more coverage for character literals. Some test > > > > > cases that are interesting: multichar literals (`'abcd'`), character > > > > > literals with UCNs (`'\uFFFF'`), character literals with numeric > > > > > escapes (`'\xFF'`). I'm especially interested in seeing whether we > > > > > handle integer promotions properly, especially when promoting to > > > > > `unsigned int`. > > > > I added two more test cases but I'm generally not that familiar with > > > > character literal edge cases and integer promotion, so if you have > > > > concrete test cases in mind, that would be great :) > > > We can find GNU documentation of multi-char literals here: > > > https://gcc.gnu.org/onlinedocs/cpp/Implementation-defined-behavior.html#Implementation-defined-behavior > > > and I believe we follow the same scheme. > > > > > > There are some weird cases like `'\8'` which compilers seem to treat > > > consistently but generate a diagnostic for. > > CC @tahonermann and @cor3ntin as text encoding code owners -- they likely > > know all the worst test cases to be thinking about in this space. > Most weirdness are taken care of during lexing. > What is interesting to test during evaluation > > * multi character literals ie `'abcd'` > * `u8` (utf8) and `u` literals (utf16) > > Note that wide characters literals were removed from C++ so it's no longer a > concern I believe Corentin meant that wide **multi**character literals were removed from C++. That is true for C++23 (they were removed by the adoption of [[ https://wg21.link/P2362R3 | P2362R3 ]] during the July 2022 WG21 plenary) but that change was not adopted as a DR, so they remain part of the language for prior language modes. However, they were a conditionally supported feature that, as of Clang 14, are diagnosed as an error in all language modes. All that to say that I agree; no test for them needed. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D135366/new/ https://reviews.llvm.org/D135366 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits