Re: The internal string API

2001-06-29 Thread Dan Sugalski
At 07:57 PM 6/28/2001 -0500, Jarkko Hietaniemi wrote: >On Fri, Jun 29, 2001 at 02:52:03AM +0200, Bart Lateur wrote: > > If I have a file in French, and a file in Chinese, I want one to > > be treated as French, and the other as Chinese. > >And what do you do one you have a list of say, employees,

Re: The internal string API

2001-06-28 Thread Jarkko Hietaniemi
On Fri, Jun 29, 2001 at 02:52:03AM +0200, Bart Lateur wrote: > On Tue, 19 Jun 2001 14:51:43 -0500, Jarkko Hietaniemi wrote: > > >But a locale is a collection of user preferences. How I want > >my dates to be formatted, how I want my strings to be sorted. > > That's not right. If I do a text con

Re: The internal string API

2001-06-28 Thread Bart Lateur
On Tue, 19 Jun 2001 14:51:43 -0500, Jarkko Hietaniemi wrote: >But a locale is a collection of user preferences. How I want >my dates to be formatted, how I want my strings to be sorted. That's not right. If I do a text conversion from Windows to Mac, I would want to source to use the CP-1522 lo

Re: The internal string API

2001-06-20 Thread David L. Nicol
Dave Mitchell wrote: > some sort of clone method With tree strings, at clone time they get reorged into minimal number of nodes: back to one big block if they are all the same type, or back to one block for each type transition if it is tagged data. Having the basic string type support arbi

RE: The internal string API

2001-06-20 Thread Hong Zhang
> >> Taiwanese read traditional chinese characters, but PRC people read > > >> simplied chinese. Even we take the same data, and same program (code), > > >> people just read differently. As an end user, I want to make the decision. > > >> It will drive me crazy if Perl render/display the text fil

Re: The internal string API

2001-06-20 Thread Dan Sugalski
At 05:16 PM 6/20/2001 +, Nick Ing-Simmons wrote: >Jarkko Hietaniemi <[EMAIL PROTECTED]> writes: > >> Taiwanese read traditional chinese characters, but PRC people read > >> simplied chinese. Even we take the same data, and same program (code), > >> people just read differently. As an end user,

RE: The internal string API

2001-06-20 Thread Dan Sugalski
At 10:31 AM 6/20/2001 -0700, Hong Zhang wrote: > > The one problem with copy-on-write is that, if we implement it in >software, > > we end up paying the price to check it on every string write. (No free > > depending on the hardware, alas) > > > > Not that this should shoot down the idea of COW s

RE: The internal string API

2001-06-20 Thread Hong Zhang
> The one problem with copy-on-write is that, if we implement it in software, > we end up paying the price to check it on every string write. (No free > depending on the hardware, alas) > > Not that this should shoot down the idea of COW strings, but it is a cost > that needs considering. (I

Re: The internal string API

2001-06-20 Thread Nick Ing-Simmons
Jarkko Hietaniemi <[EMAIL PROTECTED]> writes: >> Taiwanese read traditional chinese characters, but PRC people read >> simplied chinese. Even we take the same data, and same program (code), >> people just read differently. As an end user, I want to make the decision. >> It will drive me crazy if P

Re: The internal string API

2001-06-20 Thread Dave Mitchell
Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 05:43 PM 6/19/2001 -0500, David L. Nicol wrote: > > set $B to copy-on-write mode, so future changes to $B do not > > affect $A > > The one problem with copy-on-write is that, if we implement it in software, > we end up paying the price to che

Re: The internal string API

2001-06-20 Thread Dan Sugalski
At 05:43 PM 6/19/2001 -0500, David L. Nicol wrote: > set $B to copy-on-write mode, so future changes to $B do not > affect $A The one problem with copy-on-write is that, if we implement it in software, we end up paying the price to check it on every string write. (No free depending on

Re: The internal string API

2001-06-20 Thread Dan Sugalski
At 03:17 PM 6/20/2001 +0200, Bart Lateur wrote: >On Tue, 19 Jun 2001 11:53:28 -0700, Hong Zhang wrote: > > >> * Do a substr operation by character and glyph > > > >The byte based is more useful. I have utf-8, and I want to substr it > >to another utf-8. It is painful to convert it or linear search

RE: The internal string API

2001-06-20 Thread Dan Sugalski
At 04:23 PM 6/19/2001 -0700, Hong Zhang wrote: >This is the common approach of complicated text representation, >the implemetations I have seen includes IBM IText and SGI >rope. For "rope", each rope is represented by either of a simple >immutable string, a simple mutable string, a simple immutab

Re: The internal string API

2001-06-20 Thread Bart Lateur
On Tue, 19 Jun 2001 11:53:28 -0700, Hong Zhang wrote: >> * Do a substr operation by character and glyph > >The byte based is more useful. I have utf-8, and I want to substr it >to another utf-8. It is painful to convert it or linear search for >charaacter >position. I tend to agree. I currently

RE: The internal string API

2001-06-19 Thread Hong Zhang
This is the common approach of complicated text representation, the implemetations I have seen includes IBM IText and SGI rope. For "rope", each rope is represented by either of a simple immutable string, a simple mutable string, a simple immutable substring of another rope, or a binary node of

Re: The internal string API

2001-06-19 Thread David L. Nicol
Dan Sugalski wrote: > >If the internal string API is a tree instead of a contiguous memory block, > >the tagging could be done at the node or branch level. > > > >Besides, you get nondestructive inserts. > > Yup. The only problem is that it makes the string data significantly more > complex. I d

Re: The internal string API

2001-06-19 Thread Dan Sugalski
At 04:08 PM 6/19/2001 -0500, David L. Nicol wrote: >Dan Sugalski wrote: > > Hong Zhang wrote: > > > > > I don't see the core should support language/locale in this detail. > > > I deal a lot of mix chinese/english text file. There is no way to > represent > > > it using plain string, unless you w

Re: The internal string API

2001-06-19 Thread David L. Nicol
Dan Sugalski wrote: > Hong Zhang wrote: > > > I don't see the core should support language/locale in this detail. > > I deal a lot of mix chinese/english text file. There is no way to represent > > it using plain string, unless you want to make string be rich-format-text > > -buffer. Current local

Re: The internal string API

2001-06-19 Thread Jarkko Hietaniemi
> Taiwanese read traditional chinese characters, but PRC people read > simplied chinese. Even we take the same data, and same program (code), > people just read differently. As an end user, I want to make the decision. > It will drive me crazy if Perl render/display the text file using > tradition

Re: The internal string API

2001-06-19 Thread Dan Sugalski
At 02:51 PM 6/19/2001 -0500, Jarkko Hietaniemi wrote: > > Gah. I thought (and I use the word loosely here) that locales generally > > specified how a particular character should be interpreted when there's > > some ambiguity--the high bit ASCII characters spring to mind, given > there's > > a doz

Re: The internal string API

2001-06-19 Thread Jarkko Hietaniemi
> I think you misunderstand my point. It is "a property of the code region", > but "a property of the context in which is the code is running". For > example, > Taiwanese read traditional chinese characters, but PRC people read > simplied chinese. Even we take the same data, and same program (code

Re: The internal string API

2001-06-19 Thread Jarkko Hietaniemi
> Gah. I thought (and I use the word loosely here) that locales generally > specified how a particular character should be interpreted when there's > some ambiguity--the high bit ASCII characters spring to mind, given there's > a dozen or more different interpretations with them. I was under th

Re: The internal string API

2001-06-19 Thread Dan Sugalski
At 02:31 PM 6/19/2001 -0500, Jarkko Hietaniemi wrote: > > I think you misunderstand my point. It is "a property of the code region", > > but "a property of the context in which is the code is running". For > > example, > > Taiwanese read traditional chinese characters, but PRC people read > > simp

RE: The internal string API

2001-06-19 Thread Dan Sugalski
At 12:25 PM 6/19/2001 -0700, Hong Zhang wrote: > > >What do you mean by character size if it does not support variable >length? > > > > Well, if strings are to be treated relatively abstractly, and we still >want > > to poke around through the string buffer, we need to know how big a > > characte

RE: The internal string API

2001-06-19 Thread Hong Zhang
> >What do you mean by character size if it does not support variable length? > > Well, if strings are to be treated relatively abstractly, and we still want > to poke around through the string buffer, we need to know how big a > character is. I agree on this. I think support variable length

RE: The internal string API

2001-06-19 Thread Hong Zhang
> * Convert from and to UTF-32 > * lengths in bytes, characters, and possibly glyphs > * character size (with the variable length ones reporting in negative numbers) What do you mean by character size if it does not support variable length? > * get and set the locale (This might not be the spot