At 11:29 AM 6/8/2001 -0700, Hong Zhang wrote:
> > If this is the case, how would a regex like "^[a-zA-Z]" work (or other,
>more
> > sensitive characters)? If just about anything can come between A and Z,
>and
> > letters that might be there in a particular locale aren't in another
>locale,
> > then how will regex engine make the distinction?
>
>This syntax was designed for English. It just does not make any sense in
>Chinese. The Chinese just don't have sorting order for most of history.
>The phonetic order and stroke order was introduced only couple of hundred
>years ago.
The A-Z syntax is really a shorthand for "All the uppercase letters".
(Originally at least) I won't argue the problems with sorting various sets
of characters in various locales, but for regexes at least it's not an
issue, because the point isn't sorting or ordering, it's identifying
groups. We just need to make sure there's a named group for the different
languages we know of--things like [[:kanji]] or [[:hiragana]] for example.
(They should also be named in the language they represent, but I'm going to
take a miss on trying to wedge an example in here, as I've a hard enough
time getting letters with umlauts in)
Dan
--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
[EMAIL PROTECTED] have teddy bears and even
teddy bears get drunk