On Sun, Mar 14, 2010 at 11:23 PM, Jordi Boggiano <j.boggi...@seld.be> wrote: > On Sun, Mar 14, 2010 at 12:03 PM, Stan Vassilev <sv_for...@fmethod.com> wrote: >> UTF8 also takes 4 bytes for representing characters in the higher bit >> planes, as quite a lot of bits are lost for every char in order to describe >> how long the code point is, and when it ends and so on. This means >> memory-wise it may not be of big benefit to asian countries. > > I remember Brian Aker saying that they chose to work internally with > UTF-8 for Drizzle. His explanation of it was that asian countries have > so much english content mixed in that on average even for them UTF-8 > still had a lower footprint than UTF-16/32. I do not know where the > stats came from, but if it holds any truth it is worth considering.
This is true, as most of the text data that are interchanged in the Internet should be represented in HTML, in which such characters and alphabetic tags always appear alternatively. Moriyoshi -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php