[PATCH] D135366: [clang][Interp] Implement String- and CharacterLiterals

Tom Honermann via Phabricator via cfe-commits Mon, 10 Oct 2022 14:06:56 -0700

tahonermann added inline comments.


================
Comment at: clang/lib/AST/Interp/ByteCodeExprGen.cpp:445-446
+    const CharacterLiteral *E) {
+  if (Optional<PrimType> T = classify(E->getType()))
+    return this->emitConst(E, E->getValue());
+
----------------
`T` isn't used and `classify()` already returns an `Optional<PrimType>`, so no 
conversion coercion appear to be intended here. I think this can be simplified 
to:
  if (classify(E))
    this->emitConst(E, E->getValue());


================
Comment at: clang/test/AST/Interp/arrays.cpp:138
+  constexpr char32_t c = U'\U0001F60E';
+  static_assert(c == U'😎');
+
----------------
This check depends on the source file being UTF-8 encoded. Perhaps compare 
against `0x0001F60EL` instead.


================
Comment at: clang/test/AST/Interp/arrays.cpp:143
+
+};
----------------
As others already noted, additional testing of multicharacter literals and UCNs 
(including named universal characters like `\N{LATIN_CAPITAL_LETTER_E}` would 
be beneficial. Some tests of character escapes like `\t` wouldn't hurt either.

Clang does not yet support use of `-fexec-charset` to set the literal encoding 
(execution character set) to anything other than UTF-8 though work on that has 
been done (see D93031). If such work was completed, it would be useful to run 
some tests against a non-UTF-8 encoding. Maybe next year.


================
Comment at: clang/test/AST/Interp/arrays.cpp:135
+  static_assert(u32[1] == U'b', "");
+};
+
----------------
cor3ntin wrote:
> aaron.ballman wrote:
> > shafik wrote:
> > > tbaeder wrote:
> > > > aaron.ballman wrote:
> > > > > I think you need a more coverage for character literals. Some test 
> > > > > cases that are interesting: multichar literals (`'abcd'`), character 
> > > > > literals with UCNs (`'\uFFFF'`), character literals with numeric 
> > > > > escapes (`'\xFF'`). I'm especially interested in seeing whether we 
> > > > > handle integer promotions properly, especially when promoting to 
> > > > > `unsigned int`.
> > > > I added two more test cases but I'm generally not that familiar with 
> > > > character  literal edge cases and integer promotion, so if you have 
> > > > concrete test cases in mind, that would be great :)
> > > We can find GNU documentation of multi-char literals here: 
> > > https://gcc.gnu.org/onlinedocs/cpp/Implementation-defined-behavior.html#Implementation-defined-behavior
> > >  and I believe we follow the same scheme. 
> > > 
> > > There are some weird cases like `'\8'` which compilers seem to treat 
> > > consistently but generate a diagnostic for.
> > CC @tahonermann and @cor3ntin as text encoding code owners -- they likely 
> > know all the worst test cases to be thinking about in this space.
> Most weirdness are taken care of during lexing.
> What is interesting to test during evaluation
> 
>  * multi character literals ie `'abcd'`
>  * `u8` (utf8) and `u` literals (utf16)
> 
> Note that wide characters literals were removed from C++ so it's no longer a 
> concern
I believe Corentin meant that wide **multi**character literals were removed 
from C++. That is true for C++23 (they were removed by the adoption of [[ 
https://wg21.link/P2362R3 | P2362R3 ]] during the July 2022 WG21 plenary) but 
that change was not adopted as a DR, so they remain part of the language for 
prior language modes. However, they were a conditionally supported feature 
that, as of Clang 14, are diagnosed as an error in all language modes. All that 
to say that I agree; no test for them needed.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135366/new/

https://reviews.llvm.org/D135366

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D135366: [clang][Interp] Implement String- and CharacterLiterals

Reply via email to