On 10/20/22 13:31, Ben Boeckel wrote:
On Thu, Oct 20, 2022 at 11:39:25 -0400, Jason Merrill wrote:
Oops, I was thinking this was in gcc as well. In libcpp there's
_cpp_valid_utf8 (which calls one_utf8_to_cppchar).
This routine has a lot more logic (including UCN decoding) and the
`one_utf8_to_cppchar` also supports out-of-bounds codepoints above
`0x10FFFF`.
The latter seems like a bug to be fixed; presumably it hasn't been
updated since the range of codepoints was restricted. This sort of
thing is why I'd like to minimize the number of separate implementations
of UTF-8 parsing.
Jason