This++.

  Unicode is not a static standard definition of all characters.  New emoji
are being added to the specification daily and while a glyph like 👪 might
look like a single "character" to a set of human eyes, and indeed in
Unicode 6.0 is a single codepoint (U+1F46A), prior to Unicode 6.0 (and
still FTR) it was still expressible using Zero Width Joining as five
separate code points: [MAN][WZJ][WOMAN][WZJ][BOY] which mb_strlen() will
tell you is five "characters" long, despite being visible as a single
grapheme.  Okay, so we look at the ICU grapheme functions, but depending on
what version of the Unicode database is installed, that answer may be five
or one.

In short: Language is complicated and there's not a one-size-fits-all
solution.

-Sara


Thank You Sara for a great example. I didn't know that the topic was covered in PHP6.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to