On 2005-04-11 15:40, "gcomnz" <[EMAIL PROTECTED]> wrote: >
"日本語".chars would return <[EMAIL PROTECTED]@語>, which can probably be expressed with UTF8? The string "日本語" is probably represented internally as UTF-8, but that should have no effect on what .chars returns, which should, indeed, be <日 [EMAIL PROTECTED]>, that is, an array whose elements are strings which each represent one Unicode code point – irrespective of encoding. I think that, in general, at the level of Perl code, 1 “character” should be one code point, and any higher-level support for combining and splitting should be outside the core, in Unicode::Whatever.