Ok, I'm still lost on the language thing. I'm not arguing, I just don't
get it, and I feel that if I'm going to do some of the things that I
want to for Perl 6, I'm going to have to get it.

On Mon, 2004-04-12 at 11:43, Dan Sugalski wrote:

> Language
> ========
> *) Provides language-sensitive manipulation of characters (case mangling)
> *) Provides language-sensitive comparisons

Those two things do not seem to me to need language-specific strings at
all. They certainly need to understand the language in which they are
operating (avoiding the use of the word locale here, as per Larry's
concerns), but why does the language of origin of the string matter?

For example, in Perl5/Ponie:

        @names=<NAMES>;
        print "Phone Book: ", sort(@names), "\n";

In this example, I don't see why I would care that NAMES might be a
pseudo-handle that iterates over several databases, and returns strings
in the 7 different languages that those databases happen to contain. I
want my Phone Book sorted in a way that is appropriate to the language
of my phone book, with whatever special-case rules MY language has for
sorting funky foreign letters (and that might mean that even though a
comparison of two strings is POSSIBLE, in the current language it might
yield an exception, e.g. because Chinese and Japanese share a great many
characters that can be roughly converted, but neither have meaning in my
American English comparison).

More generally, an operation performed on a string (be it read
(comparison) or write (upcase, etc)) should be done in the way that the
*caller* expects, regardless of what legacy source the string came from
(I daren't even guess where that string that I got over a Parrot-enabled
CORBA might have been fetched from or if the language is still used
since it was stored in a cache somewhere 200 years ago, and it damn well
better not affect my sorting, no?)

Ok, so that's my take... what am I missing?

> *) Provides language-sensitive character overrides ('ll' treated as a 
> single character, for example, in Spanish if that's still desired)
> *) Provides language-sensitive grouping overrides.

Ah, and here we come to my biggest point of confusion.

You describe logic that surrounds a given language, but you'll never
need "cmp" to know how to compare Spanish "ll" to English "ll", for
example. In fact, that doesn't even make sense to me. What you will need
is for cmp to know the Spanish comparison rules so that when it gets two
strings to compare, and it is asked to do so in Spanish, the proper
thing will happen.

I guess this boils down to two choices:

a) All strings will have the user's language by default

or

b) Strings will have different languages and behave according to their
"sources" regardless of the native rules of the user.

"b" seems to me to yield very surprising results, and not at all justify
the baggage placed inside a string. If I can be forgiven for saying so,
it's even close to Perl 4's $], which allowed you to change the
semantics of arrays, only here, you're doing it as a property on a
string so that I can't trust that any string will behave the way I
expect unless I "untaint" it.

Again, I'm asking for corrections here.

> IW: Mush together (either concatenate or substr replacement) two 
> strings of different languages but same charset

According to whose rules? Does it make sense to merge an American
English string with a Japanese string unless you have a target language?

This means that someone's rules must become dominant, and as a
programmer, I'm expecting that to be neither string a nor string b, but
the user's. If the user happens to be Portuguese, then I would expect
that some kind of exception is going to emerge, but if the user is
Japanese, then it makes sense, and American English can be treated as
romaji, and an exception thrown if non-romaji ascii characters are used.
Again, this is not something that the STRING can really have much of a
clue about. It's all context.

What is the reason for every string value carrying around such context?
Certainly numbers don't carry around their base as context, and yet
that's critical when converting to a string!

-- 
Aaron Sherman <[EMAIL PROTECTED]>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback


Reply via email to