On 10/20/22 13:31, Ben Boeckel wrote:
On Thu, Oct 20, 2022 at 11:39:25 -0400, Jason Merrill wrote:
Oops, I was thinking this was in gcc as well.  In libcpp there's
_cpp_valid_utf8 (which calls one_utf8_to_cppchar).

This routine has a lot more logic (including UCN decoding) and the
`one_utf8_to_cppchar` also supports out-of-bounds codepoints above
`0x10FFFF`.

The latter seems like a bug to be fixed; presumably it hasn't been updated since the range of codepoints was restricted. This sort of thing is why I'd like to minimize the number of separate implementations of UTF-8 parsing.

Jason

Reply via email to