Leopold Toetsch wrote: > > Benjamin Goldberg <[EMAIL PROTECTED]> wrote: > > There are a number of shortcomings in the API, which I'd like to > > address here, and propose improvments for. > > > To allow user-defined encodings, and user-defined transcoding, > > (written in parrot) the first parameter of all of the function > > pointers in the ENCODING and TYPE structures should be INTERP. > > This belongs IMHO into PerlString (or better a class derived from that).
Then how do we pass a user-defined string to a function which expects an argument in an Sreg? In particular, consider if the string is Really Really big, and actually resides on disk instead of in memory. If we had to convert from a PerlString derivative to a STRING*, then we'd have to load that whole file into memory. Ugh. > > I *really* *really* want string iterators. The current API for > > iterating through the characters of a string is, IMHO, vastly > > insufficient. > > encoding->skip_forward(.., by_n) doesn't look like that insufficient. A > skip_one() function wouldn't harm though. That wasn't precisely what I was speaking of. > > 1/ Iterators won't become invalid if the string gets moved in > > memory. > > > Currently, all we've got is a void* pointer which points into the > > buffer of the string; during GC, strings can get reallocated, making > > the pointer invalid. > > You are not allowed to cache the pointer. That's my point. I want an iterator value which I *can* cache. > string->strstart + idx is always your actual character in the string. The pointer can get invalidated even without storing it for any length of time. Consider: PMC * array = pmc_new( interpreter, enum_class_Sarray ); void * iter = str->strstart; INTVAL i = 0, len = string_length( str ); VTABLE_set_integer( array, string_length( str ) ); for( i = 0; i < len; ++i ) { INTVAL c = str->encoding->decode( iter ); VTABLE_set_integer_keyed_int( array, i, c ); iter = str->encoding->skip_forward( iter, 1 ); } What happens if VTABLE_set_integer on that sarray causes a gc to get run, and if that gc causes the string's buffer to get moved? Oops. Is the code construct I have forbidden? > To satisfy 1/ we would have to mark the string as "immobile" (which we > have a flag for) *but* you can't grow such strings, the copying > collector can't cleanup the block, where the string is in (and worse, > the collector currently just frees the block). Indeed. That's why using a pointer into the string is so very insufficient. Now, suppose that instead of a pointer, we had an integer describing the number of bytes from strstart to where we're looking... *now* most of the problems go away. It would no longer matter if the string got moved, would it? The drawback of course is that we'd need to add it to the str->strstart pointer before any time that we want to use it... but that's not especially painful. > > 10/ Add methods to PerlString to make it compatible with Iterator. > > Yep. That was in my iterator proposal. > > > 11/ Any string_ function which takes a character index as a > > parameter, should be able to take a string iterator. > > Bloat IMHO. While this abstraction is flexible, it IMHO doesn't belong > into the string subsystem but into a string class, that implements these > functions. The bloat can be avoided if the primary string_ implementations *only* took string iterators. Then, to satisfy those who want to use character indices, provide wrappers which take character index arguments, and converts them into string iterators relative to those particular strings. -- $a=24;split//,240513;s/\B/ => /for@@=qw(ac ab bc ba cb ca );{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "[EMAIL PROTECTED] ]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))&&redo;}