On Tue, Jun 05, 2001 at 05:39:36PM -0400, Bryan C . Warnock wrote:
> Some languages don't have upper or lower case. Are tests and translations
> on caseless characters true or false? (Or undefined?)
I'd say undefined.
> Should the same Unicode character, when used in two different languages, be
> string equivalent?
YES. Definitely. Same Unicode character, same thing. You wanted something
else, use a different Unicode character.
> Asciibetical order is one thing, as it (roughly) maps alphabetical order for
> English. But unless you've been blessed with a root language for Unicode
> mapping (such as Arabic), Unicodical sorting is going to be non-sensical, as
> you hop between your language variants and the characters encoded somewhere
> else (as in Farsi). And, of course, there are several different orderings
> for eastern glyph languages, IINM.
Not our problem. There are collation sequences within the the various
"subsets", and these'll work fine if we go by UTR#10. If you ask for
a non-sensical comparison between two different languages, you'll get
one.
> But I think it'd be too heavy to make Perl inherently locale-aware. The
> best, I think, would be to have Perl simply be Unicode neutral - to treat
> the characters (with any equivalencies, etc) as just data
Strongly agree.
> That would allow all the locale-specific handling code to be
> written/debugged/distributed separately from the core on its own timeframe.
Strongly agree.
> Of course, being Unicode neutral, that still leaves some stuff (like case
> determination) undefined. So maybe there should be a default locale in
> place - the current, or barring that, English, I suppose.
Default to ASCII-ish and make it very, very easy for locale handling
modules to override the various pieces of the puzzle.
--
It can be hard to tell an English bigot from a monoglot with an
inferiority complex, but one cannot tell a Welshman any thing a
tall.
- Geraint Jones.