On Tue, Jun 05, 2001 at 05:39:36PM -0400, Bryan C . Warnock wrote:
> Some languages don't have upper or lower case.  Are tests and translations 
> on caseless characters true or false?  (Or undefined?)  

I'd say undefined.

> Should the same Unicode character, when used in two different languages, be 
> string equivalent?  

YES. Definitely. Same Unicode character, same thing. You wanted something
else, use a different Unicode character.

> Asciibetical order is one thing, as it (roughly) maps alphabetical order for 
> English.  But unless you've been blessed with a root language for Unicode 
> mapping (such as Arabic), Unicodical sorting is going to be non-sensical, as 
> you hop between your language variants and the characters encoded somewhere 
> else (as in Farsi).  And, of course, there are several different orderings 
> for eastern glyph languages, IINM.

Not our problem. There are collation sequences within the the various
"subsets", and these'll work fine if we go by UTR#10. If you ask for
a non-sensical comparison between two different languages, you'll get
one.
 
> But I think it'd be too heavy to make Perl inherently locale-aware.  The 
> best, I think, would be to have Perl simply be Unicode neutral - to treat 
> the characters (with any equivalencies, etc) as just data

Strongly agree.

> That would allow all the locale-specific handling code to be 
> written/debugged/distributed separately from the core on its own timeframe.  

Strongly agree.

> Of course, being Unicode neutral, that still leaves some stuff (like case 
> determination) undefined.  So maybe there should be a default locale in 
> place - the current, or barring that, English, I suppose.

Default to ASCII-ish and make it very, very easy for locale handling
modules to override the various pieces of the puzzle.

-- 
It can be hard to tell an English bigot from a monoglot with an
inferiority complex, but one cannot tell a Welshman any thing a 
tall.
    - Geraint Jones.

Reply via email to