Re: Applying regexen/grammars to objects (was Re: String API)

2003-08-25 Thread Benjamin Goldberg
Gordon Henriksen wrote: > > Benjamin Goldberg wrote: > > > Gordon Henriksen wrote: [snip] > > > [3] Unshift hack #1: Where commit appears in the above, exit the > > > grammar, trim the beginning of the string, and re-enter. (But that > > > forces the grammar author to discard the regex state, whe

RE: Applying regexen/grammars to objects (was Re: String API)

2003-08-25 Thread Gordon Henriksen
Benjamin Goldberg wrote: > Gordon Henriksen wrote: > > > Having a lazily slurped file string simply delays disaster, and > > opens the door for Very Big Mistakes. Such strings would have to be > > treated very delicately, or the program would behave very > > inefficiently or crash. > > Although

Applying regexen/grammars to objects (was Re: String API)

2003-08-25 Thread Benjamin Goldberg
Gordon Henriksen wrote: > > Now, I don't really have much of an opinion on compound strings in > general. I do want to address one particular argument, though—the lazily > slurped file string. > > On Thursday, August 21, 2003, at 07:22 , Benjamin Goldberg wrote: > > > A foolish question: can you

Re: String API

2003-08-24 Thread Benjamin Goldberg
Dan Sugalski wrote: > > At 12:07 AM -0400 8/19/03, Benjamin Goldberg wrote: > >There are a number of shortcomings in the API, which I'd like to address > >here, and propose improvments for. > > You're conflating language level strings with low-level strings. Don't. > > STRINGs, the parrot stru

Re: String API

2003-08-24 Thread Dan Sugalski
At 10:01 AM +0200 8/22/03, Leopold Toetsch wrote: Benjamin Goldberg <[EMAIL PROTECTED]> wrote: Leopold Toetsch wrote: I have problems imaginating such kind of STRINGs. You lack sufficient imagination -- Larry's suggested that Perl6 strings may consist of a list of chunks. I can easily imagine

Re: String API

2003-08-24 Thread Dan Sugalski
At 12:57 PM +0200 8/21/03, Peter Gibbs wrote: If the string API is to be revised, I would like to suggest that consideration be given to having a single string vtable, merging the current encoding and chartype structures into a single one. I think this has been addressed, but in case it hasn&#

Re: String API

2003-08-24 Thread Dan Sugalski
At 12:07 AM -0400 8/19/03, Benjamin Goldberg wrote: There are a number of shortcomings in the API, which I'd like to address here, and propose improvments for. You're conflating language level strings with low-level strings. Don't. STRINGs, the parrot structure and what S registers point to, are

Re: String API

2003-08-24 Thread Gordon Henriksen
Now, I don't really have much of an opinion on compound strings in general. I do want to address one particular argument, though—the lazily slurped file string. On Thursday, August 21, 2003, at 07:22 , Benjamin Goldberg wrote: A foolish question: can you imagine strings which are lazily read fr

Re: String API

2003-08-22 Thread Nicholas Clark
On Thu, Aug 21, 2003 at 06:37:52PM -0400, Benjamin Goldberg wrote: > > > Nicholas Clark wrote: > > Particularly when the regexp engine is written assuming O(1) random > > access. > > It doesn't *need* to assume O(1) random access; after all, it's never > accessing *randomly*, it's always access

Re: String API

2003-08-22 Thread Leopold Toetsch
e or 2 vtables (encoding stuff) ... > If we have it in a PerlString derived class, and do not make it part of > STRING*, then we cannot pass such strings to C functions defined to > accept strings in STRING* parameters, Such C functions must be aware of the string API anyway, they can't a

Re: String API

2003-08-21 Thread Benjamin Goldberg
Leopold Toetsch wrote: > > Benjamin Goldberg wrote: > > > > > Leopold Toetsch wrote: > > Not having an INTERP argument severely limits us, even in other ways. > > The INTERP argument is fine. The user defined encoding is/was my > problem. As in, you think we shouldn't have any, at all? > > Sim

Re: String API

2003-08-21 Thread Benjamin Goldberg
Nicholas Clark wrote: > > On Wed, Aug 20, 2003 at 07:19:42PM -0400, Benjamin Goldberg wrote: > > > Leopold Toetsch wrote: > > > > But these could be converted to utf32 as soon as they are seen. > > > > For a long string, that could be quite a bit of bloat. > > Jarkko's view is that the combin

Re: String API

2003-08-21 Thread Dan Sugalski
On Thu, 21 Aug 2003, Elizabeth Mattijsen wrote: > At 14:15 +0100 8/21/03, Nicholas Clark wrote: > >On Wed, Aug 20, 2003 at 07:19:42PM -0400, Benjamin Goldberg wrote: > > > Leopold Toetsch wrote: > > > > But these could be converted to utf32 as soon as they are seen. > > > For a long string, tha

Re: String API

2003-08-21 Thread Elizabeth Mattijsen
At 14:15 +0100 8/21/03, Nicholas Clark wrote: On Wed, Aug 20, 2003 at 07:19:42PM -0400, Benjamin Goldberg wrote: > Leopold Toetsch wrote: > > But these could be converted to utf32 as soon as they are seen. > For a long string, that could be quite a bit of bloat. Jarkko's view is that the combine

Re: String API

2003-08-21 Thread Nicholas Clark
On Wed, Aug 20, 2003 at 07:19:42PM -0400, Benjamin Goldberg wrote: > Leopold Toetsch wrote: > > But these could be converted to utf32 as soon as they are seen. > > For a long string, that could be quite a bit of bloat. Jarkko's view is that the combined hit of the size of the extra code to skip

Re: String API

2003-08-21 Thread Tom Hughes
In message <[EMAIL PROTECTED]> Peter Gibbs <[EMAIL PROTECTED]> wrote: > I do not believe that the two existing parameters are orthogonal, > so the number of charset (or whatever) entities would be less than > the cross product. e.g. the existing 2 chartypes x 4 encodings > would really onl

Re: String API

2003-08-21 Thread Peter Gibbs
If the string API is to be revised, I would like to suggest that consideration be given to having a single string vtable, merging the current encoding and chartype structures into a single one. This removes one pointer from each string header, and allows a single parameter to be used instead of

Re: String API

2003-08-20 Thread Leopold Toetsch
Benjamin Goldberg wrote: Leopold Toetsch wrote: Not having an INTERP argument severely limits us, even in other ways. The INTERP argument is fine. The user defined encoding is/was my problem. Similarly, that would eliminate the chance of a STRING* which is actually a lazily concatenated list of

Re: String API

2003-08-20 Thread Benjamin Goldberg
Leopold Toetsch wrote: > > Benjamin Goldberg <[EMAIL PROTECTED]> wrote: > > Leopold Toetsch wrote: > >> > >> Benjamin Goldberg <[EMAIL PROTECTED]> wrote: > >> > There are a number of shortcomings in the API, which I'd like to > >> > address here, and propose improvments for. > >> > >> > To allow

Re: String API

2003-08-20 Thread Leopold Toetsch
Benjamin Goldberg <[EMAIL PROTECTED]> wrote: > Leopold Toetsch wrote: >> >> Benjamin Goldberg <[EMAIL PROTECTED]> wrote: >> > There are a number of shortcomings in the API, which I'd like to >> > address here, and propose improvments for. >> >> > To allow user-defined encodings, and user-defined tr

Re: String API

2003-08-19 Thread Benjamin Goldberg
Tim Bunce wrote: > > On Tue, Aug 19, 2003 at 12:07:22AM -0400, Benjamin Goldberg wrote: > > There are a number of shortcomings in the API, which I'd like to > > address here, and propose improvments for. > > Just to be sure people are keeping it in mind, I'll repost this from > Larry: > > On W

Re: String API

2003-08-19 Thread Benjamin Goldberg
Leopold Toetsch wrote: > > Benjamin Goldberg <[EMAIL PROTECTED]> wrote: > > There are a number of shortcomings in the API, which I'd like to > > address here, and propose improvments for. > > > To allow user-defined encodings, and user-defined transcoding, > > (written in parrot) the first parame

Re: String API

2003-08-19 Thread Benjamin Goldberg
Luke Palmer wrote: > > Benjamin Goldberg writes: [snip] > >9/ New ops which provide access to the string iterator API. > > Yes. What is going to be used to store an iterator. An I reg, a P reg? > If it's a PMC, would it be possible to just implement the iterator > itself as a PMC, and use t

Re: String API

2003-08-19 Thread Tim Bunce
On Tue, Aug 19, 2003 at 12:07:22AM -0400, Benjamin Goldberg wrote: > There are a number of shortcomings in the API, which I'd like to address > here, and propose improvments for. Just to be sure people are keeping it in mind, I'll repost this from Larry: On Wed, Jan 30, 2002 at 10:47:36AM -0800,

Re: String API

2003-08-19 Thread Leopold Toetsch
Benjamin Goldberg <[EMAIL PROTECTED]> wrote: > There are a number of shortcomings in the API, which I'd like to address > here, and propose improvments for. > To allow user-defined encodings, and user-defined transcoding, (written > in parrot) the first parameter of all of the function pointers in

Re: String API

2003-08-18 Thread Luke Palmer
Benjamin Goldberg writes: > I *really* *really* want string iterators. The current API for > iterating through the characters of a string is, IMHO, vastly > insufficient. Not only because it's inconvenient, but it's also essential for doing pattern matching efficiently on some multibyte encodings

String API

2003-08-18 Thread Benjamin Goldberg
There are a number of shortcomings in the API, which I'd like to address here, and propose improvments for. Not so much the string_* functions, but rather with how they work (the encoding API, the transcoding functions). To allow user-defined encodings, and user-defined transcoding, (written in p

Re: macros (was Re: string api)

2002-05-16 Thread Andy Dougherty
On Thu, 16 May 2002, Nicholas Clark wrote: > It's not nice. It's enough to drive people loopy, just looking at the output > of the preprocessor, where one source line has expanded to a 10 line wrapped > monstrosity so right in parenthesis that it couldn't be written with a dozen > Lisp fridge mag

Re: macros (was Re: string api)

2002-04-09 Thread Robert Spier
Melvin Smith wrote: > At 10:30 PM 4/8/2002 -0700, Robert Spier wrote: >>> Keep track of global (or interpreter local) scope with a macro >>> upon entry. >> I shudder every time someone says "macro" on p6i. >> perl5 has several thousand macros defined. (grep for ^#define) (over > Are you counting

Re: string api

2002-04-09 Thread Roman Hunt
part of the internal string api. although I am not certain I would hope that these wouldnt be collected during computation. I have not looked at any of the GC parts of the source yet, I will soon. I take it that "imortal" is a flag that I can |= out of the flag INTVAL? will I need to d

Re: string api

2002-04-08 Thread Melvin Smith
At 01:48 AM 4/9/2002 -0400, Michel J Lambert wrote: > > the malloc()/free() situation which is one of the primary reasons we > > use garbage collection in the first place, so why reinvent the same > > situation with different syntax? > >Generally, malloc/free are used in more complex situations th

Re: string api

2002-04-08 Thread Michel J Lambert
> I agree we need an overall architectural solution. Setting and clearing > bits manually is error-prone but fast, as you said. Its identical to > the malloc()/free() situation which is one of the primary reasons we > use garbage collection in the first place, so why reinvent the same > situation

Re: macros (was Re: string api)

2002-04-08 Thread Melvin Smith
At 10:30 PM 4/8/2002 -0700, Robert Spier wrote: >>Keep track of global (or interpreter local) scope with a macro >>upon entry. > >I shudder every time someone says "macro" on p6i. > >perl5 has several thousand macros defined. (grep for ^#define) (over 8000 >if you include all the embedding macr

macros (was Re: string api)

2002-04-08 Thread Robert Spier
> Keep track of global (or interpreter local) scope with a macro > upon entry. I shudder every time someone says "macro" on p6i. perl5 has several thousand macros defined. (grep for ^#define) (over 8000 if you include all the embedding macros. it's down to ~4000 if you cut out embedding, co

Re: string api

2002-04-08 Thread Melvin Smith
At 11:40 PM 4/8/2002 -0400, Michel J Lambert wrote: > > 2) I'm thinking of an internal stack not visible to user code that we use > > for temporary PMCs and Buffers and a simple macro for entry and > > exit of GC sensitive routines. I think I might have mentioned this. > >What defines a

Re: string api

2002-04-08 Thread Steve Fink
On Mon, Apr 08, 2002 at 11:40:28PM -0400, Michel J Lambert wrote: > However, if we can't find all the places we do buffer manipulation to mark > them immortal, how are we going to properly identify all the GC-sensitive > functions? Ack! Sorry for being anal, but I finally decided the 'immortal' n

Re: string api

2002-04-08 Thread Michel J Lambert
> >This message does remind me of how empty the TODO list is. Surely we > >can think of many more things to be done? > > Speaking of.. > > 1) Bugfix release please, we banged quite a few stack and GC bugs out. > Don't we get any dessert? Peter has already stated he'd like his parrot_realloca

Re: string api

2002-04-08 Thread Melvin Smith
At 06:10 PM 4/8/2002 -0700, Steve Fink wrote: >On Mon, Apr 08, 2002 at 07:01:44PM -0400, Melvin Smith wrote: > > At 05:49 PM 4/8/2002 -0400, Roman Hunt wrote: > > >find the definition for the string_vtable it is not in > > > > Try classes/perlstring.pmc > > > > Keep in mind there is the pr

Re: string api

2002-04-08 Thread Steve Fink
On Mon, Apr 08, 2002 at 07:01:44PM -0400, Melvin Smith wrote: > At 05:49 PM 4/8/2002 -0400, Roman Hunt wrote: > >find the definition for the string_vtable it is not in > > Try classes/perlstring.pmc > > Keep in mind there is the primitive STRING type which is the S* registers, > and then

Re: string api

2002-04-08 Thread Melvin Smith
At 05:49 PM 4/8/2002 -0400, Roman Hunt wrote: >hello: > and importance, but I feel up to the task. (Read: "Please, be > patient with the newbie"). I have begun work on The more the merrier, its been too quiet this last week. > find the definition for the string_vtable i

string api

2002-04-08 Thread Roman Hunt
hello: I am interested in contributing to the project. (Thank Dan's cross-country tour :) This is my first project of this size and importance, but I feel up to the task. (Read: "Please, be patient with the newbie"). I have begun work on string_nprintf()

Re: String API

2001-09-15 Thread Dave Storrs
(As previously remarked, I'm trying to catch up from a few days offline, so excuse me if this is OOD.) On Tue, 11 Sep 2001, Ken Fox wrote: > The interpreter knows the internals of the stack structure and is > responsible for managing it. To change the stack implementation, we'll > have to caref

Re: String API

2001-09-14 Thread Simon Cozens
On Thu, Sep 13, 2001 at 05:59:13PM +0100, Tom Hughes wrote: > Especially since all function names starting with str are strictly > speaking reserved to ANSI/ISO for future expansion of the string.h > facilities ;-) Oh blow. If you're worried about namespace pollution, you're definitely blaming th

Re: String API

2001-09-13 Thread Tom Hughes
In message <[EMAIL PROTECTED]> Benjamin Stuhl <[EMAIL PROTECTED]> wrote: > Thus wrote the illustrious Simon Cozens: > [severely trimmed] > > STRING* string_make(void *buffer, IV buflen, IV > > encoding, IV flags, IV type) > > STRING* string_copy(STRING* s) > > void string_de

Re: String API

2001-09-13 Thread Simon Cozens
On Tue, Sep 11, 2001 at 09:08:18AM +0100, Simon Cozens wrote: > > The use of an encoding enum seems a little weird, but once > > you explain why it will probably make sense. Right now the > > only thing it seems good for is the transcoding system -- > > everything else is slower and vtables are mo

Re: String API

2001-09-11 Thread Dan Sugalski
At 01:35 PM 9/11/2001 -0400, Ken Fox wrote: >Dan Sugalski wrote: > > If you're speaking of multiple buffers for a string or something like that, > > you're looking at too low a level. That's something that should go in the > > variables, not in the string bits. (We do *not* want all string ops slo

Re: String API

2001-09-11 Thread Ken Fox
Dan Sugalski wrote: > If you're speaking of multiple buffers for a string or something like that, > you're looking at too low a level. That's something that should go in the > variables, not in the string bits. (We do *not* want all string ops slow to > support flexibility of this sort. Only the b

Re: String API

2001-09-11 Thread Dan Sugalski
At 12:15 PM 9/11/2001 -0400, Ken Fox wrote: >Simon Cozens wrote: > > On Mon, Sep 10, 2001 at 08:38:43PM -0400, Ken Fox wrote: > > > Have you guys seen Topaz? > > > > I may have heard of it, yes. > >That's it? You're rejecting all of that work without >learning anything from it? Building strings on

Re: String API

2001-09-11 Thread Simon Cozens
On Tue, Sep 11, 2001 at 12:15:37PM -0400, Ken Fox wrote: > Simon Cozens wrote: > > On Mon, Sep 10, 2001 at 08:38:43PM -0400, Ken Fox wrote: > > > Have you guys seen Topaz? > > > > I may have heard of it, yes. > > That's it? You're rejecting all of that work without > learning anything from it?

Re: String API

2001-09-11 Thread Ken Fox
Simon Cozens wrote: > On Mon, Sep 10, 2001 at 08:38:43PM -0400, Ken Fox wrote: > > Have you guys seen Topaz? > > I may have heard of it, yes. That's it? You're rejecting all of that work without learning anything from it? Building strings on buffers looked like a really good idea. In general I

Re: String API

2001-09-11 Thread Simon Cozens
On Mon, Sep 10, 2001 at 08:38:43PM -0400, Ken Fox wrote: > Have you guys seen Topaz? I may have heard of it, yes. > The other major suggestion I have is to avoid "void *" > interfaces. I'm using void * to avoid char *. :) > Do we really need a string_make() that takes > the encoding enum?

Re: String API

2001-09-10 Thread Ken Fox
Simon Cozens wrote: > =head1 The Parrot String API Have you guys seen Topaz? One of many things I think Chip did right was to build strings from a low-level buffer concept. This moves memory management (and possibly raw-io) out of the string class and into the buffer class. The other ma

Re: String API

2001-09-10 Thread Dan Sugalski
At 12:46 PM 9/10/2001 -0400, Jason Gloudon wrote: >Will the buffers associated with a string be managed by Parrot's memory >management, and be visible to the garbage collector ? Or will these buffers be >allocated from their own pool of memory not subject to garbage collection. They'll be GC'd.

Re: String API

2001-09-10 Thread Jason Gloudon
Will the buffers associated with a string be managed by Parrot's memory management, and be visible to the garbage collector ? Or will these buffers be allocated from their own pool of memory not subject to garbage collection. -- Jason

Re: String API

2001-09-10 Thread Edwin Steiner
[EMAIL PROTECTED] (Simon Cozens) writes: > =head2 C > > This field is, as its name suggests, unused; however, it can be used to > hold a pointer to the correct vtable for foreign strings. Wouldn't it be better to put a vtable * directly inside struct parrot_string instead of the 'encoding' enum

Re: String API

2001-09-10 Thread Dan Sugalski
At 12:53 PM 9/10/2001 +0100, Nicholas Clark wrote: >On Sun, Sep 09, 2001 at 10:16:27PM +0100, Simon Cozens wrote: > > =head1 Elements of the C structure > > > > Those implementing the C API will obviously need to know about > > how the C structure works. You can find the definition of this > > str

Re: String API

2001-09-10 Thread Benjamin Stuhl
--- Simon Cozens <[EMAIL PROTECTED]> wrote: > On Mon, Sep 10, 2001 at 04:48:35AM -0700, Benjamin Stuhl > wrote: > > *cough* Namespace pollution *cough* > > These should proably all be prefixed... > > You're going to have a canary when you see the rest of > the code... :) I know. I've looked at t

Re: String API

2001-09-10 Thread Simon Cozens
On Mon, Sep 10, 2001 at 12:53:51PM +0100, Nicholas Clark wrote: > > void *bufstart; > > A split buffer would allow an offset at the front, by effectively treating > the STRING as '' . 'Pathologically Eclectic Rubbish Lister' Who says we don't support split buffers? void* bufstart can be an

Re: String API

2001-09-10 Thread Simon Cozens
On Mon, Sep 10, 2001 at 04:48:35AM -0700, Benjamin Stuhl wrote: > *cough* Namespace pollution *cough* > These should proably all be prefixed... You're going to have a canary when you see the rest of the code... :) Seriously, I see the string subsystem as being self-sufficient; it can be detached

Re: String API

2001-09-10 Thread Benjamin Stuhl
Thus wrote the illustrious Simon Cozens: [severely trimmed] > STRING* string_make(void *buffer, IV buflen, IV > encoding, IV flags, IV type) > STRING* string_copy(STRING* s) > void string_destroy(STRING *s) *cough* Namespace pollution *cough* These should proably all be prefixed...

String API

2001-09-10 Thread Simon Cozens
tch it up based on the results of this thread so I can go on writing the next bit of documentation... =head1 The Parrot String API This document describes how Parrot abstracts the programmer's interface to string types. All strings used in the Parrot core should use the Parrot C structure

Re: The internal string API

2001-06-29 Thread Dan Sugalski
At 07:57 PM 6/28/2001 -0500, Jarkko Hietaniemi wrote: >On Fri, Jun 29, 2001 at 02:52:03AM +0200, Bart Lateur wrote: > > If I have a file in French, and a file in Chinese, I want one to > > be treated as French, and the other as Chinese. > >And what do you do one you have a list of say, employees,

Re: The internal string API

2001-06-28 Thread Jarkko Hietaniemi
On Fri, Jun 29, 2001 at 02:52:03AM +0200, Bart Lateur wrote: > On Tue, 19 Jun 2001 14:51:43 -0500, Jarkko Hietaniemi wrote: > > >But a locale is a collection of user preferences. How I want > >my dates to be formatted, how I want my strings to be sorted. > > That's not right. If I do a text con

Re: The internal string API

2001-06-28 Thread Bart Lateur
On Tue, 19 Jun 2001 14:51:43 -0500, Jarkko Hietaniemi wrote: >But a locale is a collection of user preferences. How I want >my dates to be formatted, how I want my strings to be sorted. That's not right. If I do a text conversion from Windows to Mac, I would want to source to use the CP-1522 lo

Re: The internal string API

2001-06-20 Thread David L. Nicol
Dave Mitchell wrote: > some sort of clone method With tree strings, at clone time they get reorged into minimal number of nodes: back to one big block if they are all the same type, or back to one block for each type transition if it is tagged data. Having the basic string type support arbi

RE: The internal string API

2001-06-20 Thread Hong Zhang
make a stab at rendering Unicode > >(not a very good one I am the 1st to admit which is why it isn't released!). > > > >It would be good if Tk-for-perl6 did not have to break the rules or > >provide its own hooks for meta data and could use "the" string API. &

Re: The internal string API

2001-06-20 Thread Dan Sugalski
ough to give it a GUI. >perl5.7.1+/Tk803.???-to-be will now make a stab at rendering Unicode >(not a very good one I am the 1st to admit which is why it isn't released!). > >It would be good if Tk-for-perl6 did not have to break the rules or >provide i

RE: The internal string API

2001-06-20 Thread Dan Sugalski
At 10:31 AM 6/20/2001 -0700, Hong Zhang wrote: > > The one problem with copy-on-write is that, if we implement it in >software, > > we end up paying the price to check it on every string write. (No free > > depending on the hardware, alas) > > > > Not that this should shoot down the idea of COW s

RE: The internal string API

2001-06-20 Thread Hong Zhang
> The one problem with copy-on-write is that, if we implement it in software, > we end up paying the price to check it on every string write. (No free > depending on the hardware, alas) > > Not that this should shoot down the idea of COW strings, but it is a cost > that needs considering. (I

Re: The internal string API

2001-06-20 Thread Nick Ing-Simmons
he 1st to admit which is why it isn't released!). It would be good if Tk-for-perl6 did not have to break the rules or provide its own hooks for meta data and could use "the" string API. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: The internal string API

2001-06-20 Thread Dave Mitchell
Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 05:43 PM 6/19/2001 -0500, David L. Nicol wrote: > > set $B to copy-on-write mode, so future changes to $B do not > > affect $A > > The one problem with copy-on-write is that, if we implement it in software, > we end up paying the price to che

Re: The internal string API

2001-06-20 Thread Dan Sugalski
At 05:43 PM 6/19/2001 -0500, David L. Nicol wrote: > set $B to copy-on-write mode, so future changes to $B do not > affect $A The one problem with copy-on-write is that, if we implement it in software, we end up paying the price to check it on every string write. (No free depending on

Re: The internal string API

2001-06-20 Thread Dan Sugalski
At 03:17 PM 6/20/2001 +0200, Bart Lateur wrote: >On Tue, 19 Jun 2001 11:53:28 -0700, Hong Zhang wrote: > > >> * Do a substr operation by character and glyph > > > >The byte based is more useful. I have utf-8, and I want to substr it > >to another utf-8. It is painful to convert it or linear search

RE: The internal string API

2001-06-20 Thread Dan Sugalski
At 04:23 PM 6/19/2001 -0700, Hong Zhang wrote: >This is the common approach of complicated text representation, >the implemetations I have seen includes IBM IText and SGI >rope. For "rope", each rope is represented by either of a simple >immutable string, a simple mutable string, a simple immutab

Re: The internal string API

2001-06-20 Thread Bart Lateur
On Tue, 19 Jun 2001 11:53:28 -0700, Hong Zhang wrote: >> * Do a substr operation by character and glyph > >The byte based is more useful. I have utf-8, and I want to substr it >to another utf-8. It is painful to convert it or linear search for >charaacter >position. I tend to agree. I currently

RE: The internal string API

2001-06-19 Thread Hong Zhang
This is the common approach of complicated text representation, the implemetations I have seen includes IBM IText and SGI rope. For "rope", each rope is represented by either of a simple immutable string, a simple mutable string, a simple immutable substring of another rope, or a binary node of

Re: The internal string API

2001-06-19 Thread David L. Nicol
Dan Sugalski wrote: > >If the internal string API is a tree instead of a contiguous memory block, > >the tagging could be done at the node or branch level. > > > >Besides, you get nondestructive inserts. > > Yup. The only problem is that it makes the string data s

Re: The internal string API

2001-06-19 Thread Dan Sugalski
ed to me. With that in mind, it's a far less useful thing to tag data > > with. > >If the internal string API is a tree instead of a contiguous memory block, >the tagging could be done at the node or branch level. > >Besides, you get nondestructive inserts. Yup. The o

Re: The internal string API

2001-06-19 Thread David L. Nicol
mat-text > > -buffer. Current locale or explicit locale parameter will suffice your goal. > > On the other hand, the case of mixed-data strings was one that hadn't > occurred to me. With that in mind, it's a far less useful thing to tag data > with. If the internal strin

Re: The internal string API

2001-06-19 Thread Jarkko Hietaniemi
> Taiwanese read traditional chinese characters, but PRC people read > simplied chinese. Even we take the same data, and same program (code), > people just read differently. As an end user, I want to make the decision. > It will drive me crazy if Perl render/display the text file using > tradition

Re: The internal string API

2001-06-19 Thread Dan Sugalski
At 02:51 PM 6/19/2001 -0500, Jarkko Hietaniemi wrote: > > Gah. I thought (and I use the word loosely here) that locales generally > > specified how a particular character should be interpreted when there's > > some ambiguity--the high bit ASCII characters spring to mind, given > there's > > a doz

Re: The internal string API

2001-06-19 Thread Jarkko Hietaniemi
> I think you misunderstand my point. It is "a property of the code region", > but "a property of the context in which is the code is running". For > example, > Taiwanese read traditional chinese characters, but PRC people read > simplied chinese. Even we take the same data, and same program (code

Re: The internal string API

2001-06-19 Thread Jarkko Hietaniemi
> Gah. I thought (and I use the word loosely here) that locales generally > specified how a particular character should be interpreted when there's > some ambiguity--the high bit ASCII characters spring to mind, given there's > a dozen or more different interpretations with them. I was under th

Re: The internal string API

2001-06-19 Thread Dan Sugalski
At 02:31 PM 6/19/2001 -0500, Jarkko Hietaniemi wrote: > > I think you misunderstand my point. It is "a property of the code region", > > but "a property of the context in which is the code is running". For > > example, > > Taiwanese read traditional chinese characters, but PRC people read > > simp

RE: The internal string API

2001-06-19 Thread Dan Sugalski
At 12:25 PM 6/19/2001 -0700, Hong Zhang wrote: > > >What do you mean by character size if it does not support variable >length? > > > > Well, if strings are to be treated relatively abstractly, and we still >want > > to poke around through the string buffer, we need to know how big a > > characte

RE: The internal string API

2001-06-19 Thread Hong Zhang
> >What do you mean by character size if it does not support variable length? > > Well, if strings are to be treated relatively abstractly, and we still want > to poke around through the string buffer, we need to know how big a > character is. I agree on this. I think support variable length

RE: The internal string API

2001-06-19 Thread Hong Zhang
> * Convert from and to UTF-32 > * lengths in bytes, characters, and possibly glyphs > * character size (with the variable length ones reporting in negative numbers) What do you mean by character size if it does not support variable length? > * get and set the locale (This might not be the spot

The internal string API

2001-06-19 Thread Dan Sugalski
Since we're going to try and take a shot at being encoding-neutral in the core, we're going to need some form of string API so the core can actually manipulate string data. I'm thinking we'll need to be able to at least do this with string: * Convert from and to UTF-3