Do we actually have use cases for rejecting non-characters in UTF-8ness check?

Henri Sivonen Fri, 17 Mar 2017 03:01:32 -0700

Our IsUTF8() by default rejects strings that contain code points whose
lowest 16 bits are 0xFFFE or 0xFFFF.


Do we actually have use cases for rejecting such strings in UTF-8ness checks?

The code was introduced in
https://bugzilla.mozilla.org/show_bug.cgi?id=191541 and both the patch
author and the reviewer seemed unsure of the utility of this quirk at
the time.

(To reduce bloat and to benefit from SIMD, I'd like to replace the
implementation of IsUTF8() with a call to Rust code that contains
optimized UTF-8ness checking code in any case, but that code doesn't
have the quirk of rejecting non-characters.)

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Do we actually have use cases for rejecting non-characters in UTF-8ness check?

Reply via email to