RE: Unicode sorting...

Dan Sugalski Fri, 08 Jun 2001 12:02:00 -0700
At 11:29 AM 6/8/2001 -0700, Hong Zhang wrote:

> > If this is the case, how would a regex like "^[a-zA-Z]" work (or other,
>more
> > sensitive characters)? If just about anything can come between A and Z,
>and
> > letters that might be there in a particular locale aren't in another
>locale,
> > then how will regex engine make the distinction?
>
>This syntax was designed for English. It just does not make any sense in
>Chinese. The Chinese just don't have sorting order for most of history. 
>The phonetic order and stroke order was introduced only couple of hundred 
>years ago.

The A-Z syntax is really a shorthand for "All the uppercase letters". 
(Originally at least) I won't argue the problems with sorting various sets 
of characters in various locales, but for regexes at least it's not an 
issue, because the point isn't sorting or ordering, it's identifying 
groups. We just need to make sure there's a named group for the different 
languages we know of--things like [[:kanji]] or [[:hiragana]] for example. 
(They should also be named in the language they represent, but I'm going to 
take a miss on trying to wedge an example in here, as I've a hard enough 
time getting letters with umlauts in)

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk
RE: Unicode sorting...

Reply via email to