This is great thanks for the work! I think we should have an opinion on grapheme clusters and tell about it in the RFC.
I do support the idea that PHP users need to handle "characters" in term of "graphemes". We need a core way to deal with code points of course, but things like "reverse" have very low value without graphemes. toLower/toUpper also misses the turkish specifics - or is the Ustring class "locale" dependent? Should we add "toCaseFold"? Where are the "i" version of strpos, etc. Do we want them in core PHP7? An other point we should add to the RFC. For reference here is my grapheme cluster aware string handling: https://github.com/nicolas-grekas/Patchwork-UTF8/blob/master/class/Patchwork/Utf8.php and the same but turkish variant: https://github.com/nicolas-grekas/Patchwork-UTF8/blob/master/class/Patchwork/TurkishUtf8.php About unicode equivalence: For all the string matching functions (contains, startsWith, etc.) do they handling unicode equivalence? How do we compare two Ustrings? Does the == operator handle unicode equivalence? What is the way to go otherwise? Normalize is before on our own? The RFC should tell about it also IMHO (and tell that collation/sorting handling is out of scope). Complex topic :) Cheers, NIcolas