Hi, I used to work a job where we used UTF-16 for embedded applications. Our company chose UTF-16 over UTF-8 because it was byte-aligned and therefore faster / more effecient to process than UTF-8. However theres no reason why UTF-8 has to be drastically slower. The truch is, even we could have used UTF-8 there. And I don't buy the whole byte size / memory thing either. Even in our restricted embedded environments, that was never a consideration anyway. Because a well written program won't bloat memory by holding too many strings. That's what MYSQL is for.
Apple uses UTF-16 for CFString, NSString data. But elsewhere (and on the web!) most people uses UTF-8. Pretty much. You should implement UTF-8, with a view to still allow adding UTF-16 support later on. That is to say, the encoding should be wrapped, and switchable underneath. Of course all that is easier said than done with PHP. But thats the right way to do it. On Sun, Mar 14, 2010 at 2:23 PM, Jordi Boggiano <j.boggi...@seld.be> wrote: > On Sun, Mar 14, 2010 at 12:03 PM, Stan Vassilev <sv_for...@fmethod.com> wrote: >> UTF8 also takes 4 bytes for representing characters in the higher bit >> planes, as quite a lot of bits are lost for every char in order to describe >> how long the code point is, and when it ends and so on. This means >> memory-wise it may not be of big benefit to asian countries. > > I remember Brian Aker saying that they chose to work internally with > UTF-8 for Drizzle. His explanation of it was that asian countries have > so much english content mixed in that on average even for them UTF-8 > still had a lower footprint than UTF-16/32. I do not know where the > stats came from, but if it holds any truth it is worth considering. > > Cheers, > Jordi > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > > -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php