Ok, I'm still lost on the language thing. I'm not arguing, I just don't get it, and I feel that if I'm going to do some of the things that I want to for Perl 6, I'm going to have to get it.
On Mon, 2004-04-12 at 11:43, Dan Sugalski wrote: > Language > ======== > *) Provides language-sensitive manipulation of characters (case mangling) > *) Provides language-sensitive comparisons Those two things do not seem to me to need language-specific strings at all. They certainly need to understand the language in which they are operating (avoiding the use of the word locale here, as per Larry's concerns), but why does the language of origin of the string matter? For example, in Perl5/Ponie: @names=<NAMES>; print "Phone Book: ", sort(@names), "\n"; In this example, I don't see why I would care that NAMES might be a pseudo-handle that iterates over several databases, and returns strings in the 7 different languages that those databases happen to contain. I want my Phone Book sorted in a way that is appropriate to the language of my phone book, with whatever special-case rules MY language has for sorting funky foreign letters (and that might mean that even though a comparison of two strings is POSSIBLE, in the current language it might yield an exception, e.g. because Chinese and Japanese share a great many characters that can be roughly converted, but neither have meaning in my American English comparison). More generally, an operation performed on a string (be it read (comparison) or write (upcase, etc)) should be done in the way that the *caller* expects, regardless of what legacy source the string came from (I daren't even guess where that string that I got over a Parrot-enabled CORBA might have been fetched from or if the language is still used since it was stored in a cache somewhere 200 years ago, and it damn well better not affect my sorting, no?) Ok, so that's my take... what am I missing? > *) Provides language-sensitive character overrides ('ll' treated as a > single character, for example, in Spanish if that's still desired) > *) Provides language-sensitive grouping overrides. Ah, and here we come to my biggest point of confusion. You describe logic that surrounds a given language, but you'll never need "cmp" to know how to compare Spanish "ll" to English "ll", for example. In fact, that doesn't even make sense to me. What you will need is for cmp to know the Spanish comparison rules so that when it gets two strings to compare, and it is asked to do so in Spanish, the proper thing will happen. I guess this boils down to two choices: a) All strings will have the user's language by default or b) Strings will have different languages and behave according to their "sources" regardless of the native rules of the user. "b" seems to me to yield very surprising results, and not at all justify the baggage placed inside a string. If I can be forgiven for saying so, it's even close to Perl 4's $], which allowed you to change the semantics of arrays, only here, you're doing it as a property on a string so that I can't trust that any string will behave the way I expect unless I "untaint" it. Again, I'm asking for corrections here. > IW: Mush together (either concatenate or substr replacement) two > strings of different languages but same charset According to whose rules? Does it make sense to merge an American English string with a Japanese string unless you have a target language? This means that someone's rules must become dominant, and as a programmer, I'm expecting that to be neither string a nor string b, but the user's. If the user happens to be Portuguese, then I would expect that some kind of exception is going to emerge, but if the user is Japanese, then it makes sense, and American English can be treated as romaji, and an exception thrown if non-romaji ascii characters are used. Again, this is not something that the STRING can really have much of a clue about. It's all context. What is the reason for every string value carrying around such context? Certainly numbers don't carry around their base as context, and yet that's critical when converting to a string! -- Aaron Sherman <[EMAIL PROTECTED]> Senior Systems Engineer and Toolsmith "It's the sound of a satellite saying, 'get me down!'" -Shriekback