Hi! > Just a quick point: most of the core is not ASCII. PHP strings are byte > strings, completely divorced from any encoding. A few native functions > assume ISO8859-1 (or possibly Windows CP1252), but mostly they just > juggle which ever bytes you give them.
True, but not all extensions and functions behave this way. Some (especially with intl, but not only) assume it's utf-8, for example, and for some utf-8 is a changeable default, which in practice often becomes the used encoding since people are not aware of need to track their encoding and most of them do use utf-8 anyway. > The main exception I can think of is that numbers are often handled > specially, with digits and separators as defined by ASCII. But since > we're talking UTF-8, that doesn't need to change. More interesting case actually is, well, case conversion. We unknowingly used locale-dependent lowercasing routines until the inevitable encounter with the dreaded Turkish 'i'. At which point we switched to forced ASCII. So identifiers in the engine are kind of assumed to be ASCII, even though you can somethimes sneak non-ASCII past it and it will work, but weirdly. -- Stanislav Malyshev, Software Architect SugarCRM: http://www.sugarcrm.com/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php