https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116942

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2024-10-03

--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Hmm, ECMAScript doesn't have the concept of char being signed or unsigned,
characters are all Unicode. So 0x80 is just U+0080, I think. But since U+0080
is a multibyte character in UTF-8, it can never match a single char (either
signed or unsigned). std::regex is not Unicode-aware and so std::regex("\x80")
doesn't match "\u0080" even on a platform where char is unsigned.

I don't know what C++ expects to happen here, given that char strings are not
necessarily Unicode, unlike in ECMAScript.

Even if we fix our std::regex to accept [\x00-\xFF] as a range, it will not
match non-ASCII characters the same way ECMAScript does. It will treat \xFF as
char(0xFF) instead of char16_t(u'\u0080').

Reply via email to