Hello, I think that Rowan is right: PHP users need to manipulate grapheme clusters first (and code points in some rare situations). The fact that most of us live in a world were NFC composes all our characters only hides this reality.
A typical use case is a template engine: nearly all string manipulations there need grapheme awareness: cutting strings for getting excerpt, inserting a space between every "character", changing the case, etc. A typical use case for a PHP app. An other use case is if you want to implement text indexing in PHP: you need to normalize before indexing, handle case folding, and thus think in terms of graphemes. I'm not sure this is frequent in PHP though. Like already said, alongside with grapheme clusters, we should also deals with string matching: collations are out of scope, but normalization and case folding is in. Please do not forget the turkish alphabet <https://github.com/nicolas-grekas/Patchwork-UTF8/blob/master/class/Patchwork/TurkishUtf8.php> also... This is required IMHO to have what user expects for str_replace, strpos, strcmp, etc. I wrote a quite successful PHP lib to deal with this in PHP: https://github.com/nicolas-grekas/Patchwork-UTF8 My experience from this is the following: - dealing with grapheme clusters in current PHP is ok with grapheme_*() functions, but these require intl. It would be great to have them (or an equivalent) in core, - NFC normalization of all input is required to deal with string comparisons, so having Normalizer in core looks required also, - almost everybody uses mbstring when dealing with utf8 strings, but almost all cases should use a grapheme_*() instead. To be clear, I am suggesting that we aim to be the language which gets >> this right, where other languages get it wrong. >> > > Thank you for explaining this. I also think it could do better. I think > Unicode-aware strrev() shouldn't be too complicated to do. Perl 6 identified the subject very well and invented what they call "NFG", which is NFC + dynamic internal code points for non-composable grapheme clusters: http://docs.parrot.org/parrot/latest/html/docs/pdds/pdd28_strings.pod.html Maybe worth looking at? Cheers, Nicolas