Re: [Q1] (Re: The strings design document)

2004-04-28 Thread Jarkko Hietaniemi
> I think you're basically forcing this concept onto national standards > which lack it. I don't think that most of the national standards > actually define the semantics of the characters they encode > (categorizations, case mapping, sort order), and although they assign > byte sequences to re

Re: [Q1] (Re: The strings design document)

2004-04-28 Thread Jeff Clites
On Apr 27, 2004, at 10:25 AM, Dan Sugalski wrote: At 9:40 AM -0700 4/27/04, Jeff Clites wrote: On Apr 23, 2004, at 2:43 PM, Dan Sugalski wrote: CHARACTER SET - Contains meta-information about code points. This includes both the meaning of individual code points (65 i

Re: [Q1] (Re: The strings design document)

2004-04-27 Thread Jarkko Hietaniemi
Dan Sugalski wrote: > At 7:57 PM +0300 4/27/04, Jarkko Hietaniemi wrote: > >> > 1) ISO-8859-1 is used to represent text in several different languages, >> >>> including German and Swedish. German and Swedish differ in their sort >>> order, even for things they have in common. (For example, ö >>> (

Re: [Q1] (Re: The strings design document)

2004-04-27 Thread Dan Sugalski
At 7:57 PM +0300 4/27/04, Jarkko Hietaniemi wrote: > 1) ISO-8859-1 is used to represent text in several different languages, including German and Swedish. German and Swedish differ in their sort order, even for things they have in common. (For example, ö (o-with-diaeresis) is considered a separ

Re: [Q1] (Re: The strings design document)

2004-04-27 Thread Dan Sugalski
At 9:40 AM -0700 4/27/04, Jeff Clites wrote: On Apr 23, 2004, at 2:43 PM, Dan Sugalski wrote: CHARACTER SET - Contains meta-information about code points. This includes both the meaning of individual code points (65 is capital A, 776 is a combining diaresis) as

Re: [Q1] (Re: The strings design document)

2004-04-27 Thread Larry Wall
I can't answer for Dan regarding implementation issues, but from a (computer) language point of view, consistency is better than correctness on this issue, because there is no single definition of "correct" until you specify what you mean by "correct". So at the first three Unicode support levels

Re: [Q1] (Re: The strings design document)

2004-04-27 Thread Jarkko Hietaniemi
> 1) ISO-8859-1 is used to represent text in several different languages, > including German and Swedish. German and Swedish differ in their sort > order, even for things they have in common. (For example, ö > (o-with-diaeresis) is considered a separate letter in Swedish, but is > just a accent

[Q1] (Re: The strings design document)

2004-04-27 Thread Jeff Clites
On Apr 23, 2004, at 2:43 PM, Dan Sugalski wrote: CHARACTER SET - Contains meta-information about code points. This includes both the meaning of individual code points (65 is capital A, 776 is a combining diaresis) as well as a set of categorizations o