Dan Sugalski wrote:
> >If the internal string API is a tree instead of a contiguous memory block,
> >the tagging could be done at the node or branch level.
> >
> >Besides, you get nondestructive inserts.
>
> Yup. The only problem is that it makes the string data significantly more
> complex. I don't think it's a win for raw data. Perhaps for fancier data
> it'd work.
the simplest tree is one node with a raw block in it. Only when you start
doing
things to it
substr($A, 27, 3, $B)
and suchlike
does deferring the copying give a win.
Say $A is 35 megabytes long and $B is 23K. Currently, and in any
string representation that uses raw blocks, we have to do these things:
copy substr($A,27,3) to return value if needed
Allocate a new 36M block
copy substr($A,0,27)
copy $B
copy substr($A,30)
set $A's data pointer to the new block
free $A's old block
With a tree representation, the assign-to-middle operation becomes:
Return Value if needed is substr($A,27,3)
Create a new string-segment-list-node
Segment 1: substr($A,0,27)
Segment 2: $B (which might be another tree)
Segment 3: substr($A,30)
return $A's old top node to the node pool
set $A's data pointer to the new top node
set $B to copy-on-write mode, so future changes to $B do not affect $A
no new allocations!
This kind of thing also allows us to do "live interpolation" in which
ql" this will $change "
might rewrite to a magic scalar that evaluates the join every time
it is fetched instead of once when it is built.
Mixed-type? Yes! You could even have a value that is not a string at all,
hanging off your string tree.