On Sun, Mar 14, 2010 at 3:23 PM, Jordi Boggiano <j.boggi...@seld.be> wrote:
> On Sun, Mar 14, 2010 at 12:03 PM, Stan Vassilev <sv_for...@fmethod.com> wrote:
>> UTF8 also takes 4 bytes for representing characters in the higher bit
>> planes, as quite a lot of bits are lost for every char in order to describe
>> how long the code point is, and when it ends and so on. This means
>> memory-wise it may not be of big benefit to asian countries.
>
> I remember Brian Aker saying that they chose to work internally with
> UTF-8 for Drizzle. His explanation of it was that asian countries have
> so much english content mixed in that on average even for them UTF-8
> still had a lower footprint than UTF-16/32. I do not know where the
> stats came from, but if it holds any truth it is worth considering.

The idea behind his reasonning was to about optimizing the 90% of the
cases while being "fast enough" for the last 10% (could have been
other numbers, but that's the idea). For what I remember about our
discussions, he also mentioned fast UTF-8 capable string processing
implementation (as fast as what UTF-16 could be). I like this the
90/10 approach especially as it actually matches what we have in PHP.

Cheers,
-- 
Pierre

@pierrejoye | http://blog.thepimp.net | http://www.libgd.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to