Re: Plans for string processing

Aaron Sherman Tue, 13 Apr 2004 16:24:03 -0700

Thanks for your response. I'm not sure that you and I are speaking about
exactly the same things, since you state that the logical extensions, if
not outright goals, of an alternate approach would be an exclusionary
monoculture. I'm not sure that's quite right....

On Tue, 2004-04-13 at 15:06, Dan Sugalski wrote:

> >>  *) Provides language-sensitive manipulation of characters (case mangling)
> >>  *) Provides language-sensitive comparisons
> >
> >Those two things do not seem to me to need language-specific strings at
> >all. They certainly need to understand the language in which they are
> >operating (avoiding the use of the word locale here, as per Larry's
> >concerns), but why does the language of origin of the string matter?
> 
> Because the way a string is upcased/downcased/titlecased depends on 
> the language the string came from. The treatment of accents and a 
> number of specific character sequences depends on the language the 
> string came from.

> Ignore it and, well, you're going to find that 
> you're messing up the display of someone's name. That strikes me as 
> rather rude.

For proper names, you may have a point (though the ordering of names in
a phone book, for example, is often according to the language of the
book, not the origin of the names), and in some forms of string
processing, that kind of deference to the origin of a word may turn out
to be useful. I do "get" that much.

What I'm not getting is

      * Why do we assume that the language property of a string will be
        the language from which the word correctly originates rather
        than the locale of the database / web site / file server /
        whatever that we received it from? That could actually result in
        dealing with native words according to the rules of foreign
        languages, and boy-howdy is that going to be fun to debug.
      * Why is it so valuable as to attach a value to every string ever
        created for it rather than creating an abstraction at a higher
        level (e.g. a class)
      * Why wouldn't you do the same thing for MIME type, as strings may
        also (and perhaps more often) contain data which is more
        appropriately tagged that way? The SpamAssassin guys would love
        you for this!

> What I don't want to do is *force* uniformity. Some of us do care.

Hey, that's a bit of a low blow. I care quite a bit, or I would not ask.
I'm not saying that the guy who wants to sort names according to their
source language is wrong, I'm saying that he doesn't need core support
in Parrot to do it, so I'm curious why it's in there.

> We've tried the whole monoculture thing before.

I just don't think that moving language up a layer or two of abstraction
enforces a monoculture... again, I'm willing to see the light if someone
can explain it.

A lot of your response is about "enforcing", and I'm not sure how I gave
the impression of this being an enforcement issue (or perhaps you think
that non-localization is something that needs to be enforced?) I just
can't see how every string needs to carry around this kind of
world-view-altering context when 99% of programs that use string data
(even those that use mixed encodings) won't want to apply said context,
but rather perform all operations according to their locale. Am I wrong
about that?

One thing that was not answered, though is what happens in terms of
dominance. When sorting French and Norwegian Unicode strings, who loses
(wins?) when you try to compare them? Comparing across language
boundaries would be a monumental task, and would be instantly reviled as
wrong by every language purist in the world (to my knowledge no one has
ever published a uniform way to compare two words, much less arbitrary
text, unless you are willing to do so using the rules of one and only
one culture (and I say culture because often the rules of a culture are
mutually incompatible with those of any one source language's strict
rules)). So, if you have to convert in order to compare, whose language
do you do the comparison in? You can't really rely on LHS vs. RHS, since
a sort will reverse these many times (and C<$a cmp $b> had better be
C<-($b cmp $a)> or your sort may never terminate!)

-- 
Aaron Sherman <[EMAIL PROTECTED]>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback

Re: Plans for string processing

Reply via email to