Re: [PHP-DEV] Proposal for better UTF-8 handling

Adam Harvey Fri, 24 May 2013 08:35:38 -0700

On 24 May 2013 08:26, Ferenc Kovacs <tyr...@gmail.com> wrote:
> On Fri, May 24, 2013 at 3:09 PM, Nikita Popov <nikita....@gmail.com> wrote:
>> We already have a lot of functions for multibyte string handling. Let me
>> list a few:
>>
>>  * The str* functions. Most of them are safe for usage with UTF8.
>> Exceptions are basically everything where you manually provide an offset,
>> e.g. writing substr($str, 0, 100) is not safe. substr($str, 0, strpos($str,
>> 'xyz')) on the other hand is.
>>  * The mb* functions. They work with various encodings and usually make of
>> of character offsets and lengths rather than byte offsets and lengths. They
>> are not necessary most of the time, but useful for the aforementioned
>> substr call with hardcoded offsets.
>>  * The Intl extension. This give you *real* unicode support, as in
>> collations, locales, transliteration, etc.
>>  * The grapheme* functions which are also part of intl. The work with
>> grapheme cluster offsets and lengths.
>>
>> Anyway, my point is that just adding *yet another* set of string functions
>> won't solve anything, just make things even more complicated than they
>> already are. I'm not strictly opposed to adding more functions if they are
>> necessary, but one has to be aware of what there already is and how the new
>> functions will integrate.
>>
>> Nikita
>>
>
> did you just forgot the pcre functions with the /u modifier?!?!
> :P


And that's without even touching PECL. :)

I agree with Nikita — I'm not against adding more Unicode/charset
handling functions if they make sense (and I haven't looked at the
code for this particular proposal yet), particularly if they'd be part
of a default build, but enough water has hopefully passed under the
bridge since the PHP 6 days that it might be time to canvass ideas on
a less piecemeal approach to character set handling and
internationalisation for PHP 5.5+1 or PHP 5.5+2.

Adam

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Proposal for better UTF-8 handling

Reply via email to