https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116942
Jonathan Wakely <redi at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed| |2024-10-03 --- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> --- Hmm, ECMAScript doesn't have the concept of char being signed or unsigned, characters are all Unicode. So 0x80 is just U+0080, I think. But since U+0080 is a multibyte character in UTF-8, it can never match a single char (either signed or unsigned). std::regex is not Unicode-aware and so std::regex("\x80") doesn't match "\u0080" even on a platform where char is unsigned. I don't know what C++ expects to happen here, given that char strings are not necessarily Unicode, unlike in ECMAScript. Even if we fix our std::regex to accept [\x00-\xFF] as a range, it will not match non-ASCII characters the same way ECMAScript does. It will treat \xFF as char(0xFF) instead of char16_t(u'\u0080').