On Fri, Feb 11, 2022 at 3:14 AM Rowan Tommins <rowan.coll...@gmail.com> wrote:
> There's also I think a myth in people's minds that something like > "string length" has a single meaning, and PHP gets it "wrong" for > multibyte strings; > This++. Unicode is not a static standard definition of all characters. New emoji are being added to the specification daily and while a glyph like 👪 might look like a single "character" to a set of human eyes, and indeed in Unicode 6.0 is a single codepoint (U+1F46A), prior to Unicode 6.0 (and still FTR) it was still expressible using Zero Width Joining as five separate code points: [MAN][WZJ][WOMAN][WZJ][BOY] which mb_strlen() will tell you is five "characters" long, despite being visible as a single grapheme. Okay, so we look at the ICU grapheme functions, but depending on what version of the Unicode database is installed, that answer may be five or one. In short: Language is complicated and there's not a one-size-fits-all solution. -Sara