On Sun, Mar 14, 2010 at 12:03 PM, Stan Vassilev <sv_for...@fmethod.com> wrote: > UTF8 also takes 4 bytes for representing characters in the higher bit > planes, as quite a lot of bits are lost for every char in order to describe > how long the code point is, and when it ends and so on. This means > memory-wise it may not be of big benefit to asian countries.
I remember Brian Aker saying that they chose to work internally with UTF-8 for Drizzle. His explanation of it was that asian countries have so much english content mixed in that on average even for them UTF-8 still had a lower footprint than UTF-16/32. I do not know where the stats came from, but if it holds any truth it is worth considering. Cheers, Jordi -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php