On 24 May 2013 08:26, Ferenc Kovacs <tyr...@gmail.com> wrote: > On Fri, May 24, 2013 at 3:09 PM, Nikita Popov <nikita....@gmail.com> wrote: >> We already have a lot of functions for multibyte string handling. Let me >> list a few: >> >> * The str* functions. Most of them are safe for usage with UTF8. >> Exceptions are basically everything where you manually provide an offset, >> e.g. writing substr($str, 0, 100) is not safe. substr($str, 0, strpos($str, >> 'xyz')) on the other hand is. >> * The mb* functions. They work with various encodings and usually make of >> of character offsets and lengths rather than byte offsets and lengths. They >> are not necessary most of the time, but useful for the aforementioned >> substr call with hardcoded offsets. >> * The Intl extension. This give you *real* unicode support, as in >> collations, locales, transliteration, etc. >> * The grapheme* functions which are also part of intl. The work with >> grapheme cluster offsets and lengths. >> >> Anyway, my point is that just adding *yet another* set of string functions >> won't solve anything, just make things even more complicated than they >> already are. I'm not strictly opposed to adding more functions if they are >> necessary, but one has to be aware of what there already is and how the new >> functions will integrate. >> >> Nikita >> > > did you just forgot the pcre functions with the /u modifier?!?! > :P
And that's without even touching PECL. :) I agree with Nikita — I'm not against adding more Unicode/charset handling functions if they make sense (and I haven't looked at the code for this particular proposal yet), particularly if they'd be part of a default build, but enough water has hopefully passed under the bridge since the PHP 6 days that it might be time to canvass ideas on a less piecemeal approach to character set handling and internationalisation for PHP 5.5+1 or PHP 5.5+2. Adam -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php