hi,

On Sun, Mar 14, 2010 at 12:03 PM, Stan Vassilev <sv_for...@fmethod.com> wrote:

> UTF8 is good for text that contains mostly ASCII chars and the occasional
> Unicode international chars. It's also generally ok for storing and passing
> strings between apps.

That's not completely correct. UTF-8 is used out there for almost
unicode only applications as well. I'd to say it is a matter of what
the projects are written for. See below.

>
> Still, having variable-width encoding UTF8 or UTF16 doesn't cut it for
> general use to me as in tests it shows drastic slowdown when the script
> needs to do heavy string processing. I'd rather have it take more RAM for
> Unicode strings while being fast, and use Latin-1 when what I need is
> Latin-1.

The problem I have with UTF-16 is that it does not fit well with PHP
usage. While you are right about the performence vs memory usage, it
is sadly a small part of the problem. If you take a look at the
current implementation (trunk, which uses UTF-16), we have to convert
to UTF-8 almost everywhere as long as we deal with external APIs (file
systems or other libs). The win we may have from using UTF-16 is
almost completely lost by the conversions cost.

That obviously does not apply for scripts using only core PHP features
(no file access, no extension usage, etc.), but these scripts are
barely real worlds use cases.

Please not that I'm not voting against UTF-16 or for UTF-8, but I
would like to have a real evaluation this time, unlike what has been
done for trunk a couple of years ago.

Cheers,
-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to