Our IsUTF8() by default rejects strings that contain code points whose lowest 16 bits are 0xFFFE or 0xFFFF.
Do we actually have use cases for rejecting such strings in UTF-8ness checks? The code was introduced in https://bugzilla.mozilla.org/show_bug.cgi?id=191541 and both the patch author and the reviewer seemed unsure of the utility of this quirk at the time. (To reduce bloat and to benefit from SIMD, I'd like to replace the implementation of IsUTF8() with a call to Rust code that contains optimized UTF-8ness checking code in any case, but that code doesn't have the quirk of rejecting non-characters.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform