On 2005-04-11 15:40, "gcomnz" <[EMAIL PROTECTED]> wrote:
> 

"日本語".chars would return <[EMAIL PROTECTED]@語>, which can probably be 
expressed
with UTF8?

The string "日本語" is probably represented internally as UTF-8, but that
should have no effect on what .chars returns, which should, indeed, be <日 
[EMAIL PROTECTED]>, that is, an array whose elements are strings which each 
represent
one Unicode code point – irrespective of encoding.

I think that, in general, at the level of Perl code, 1 “character” should be
one code point, and any higher-level support for combining and splitting
should be outside the core, in Unicode::Whatever.




Reply via email to