Re: [PHP-DEV] [RFC] UString

Nicolas Grekas Tue, 21 Oct 2014 01:49:23 -0700

This is great thanks for the work!
I think we should have an opinion on grapheme clusters and tell about it in
the RFC.


I do support the idea that PHP users need to handle "characters" in term of
"graphemes". We need a core way to deal with code points of course, but
things like "reverse" have very low value without graphemes.

toLower/toUpper also misses the turkish specifics - or is the Ustring class
"locale" dependent?
Should we add "toCaseFold"? Where are the "i" version of strpos, etc. Do we
want them in core PHP7? An other point we should add to the RFC.

For reference here is my grapheme cluster aware string handling:
https://github.com/nicolas-grekas/Patchwork-UTF8/blob/master/class/Patchwork/Utf8.php

and the same but turkish variant:
https://github.com/nicolas-grekas/Patchwork-UTF8/blob/master/class/Patchwork/TurkishUtf8.php

About unicode equivalence:
For all the string matching functions (contains, startsWith, etc.) do they
handling unicode equivalence?
How do we compare two Ustrings? Does the == operator handle unicode
equivalence? What is the way to go otherwise? Normalize is before on our
own?
The RFC should tell about it also IMHO (and tell that collation/sorting
handling is out of scope).

Complex topic :)

Cheers,
NIcolas

Re: [PHP-DEV] [RFC] UString

Reply via email to