Back to the GC issue, I was wondering something.
Suppose I have a string stored in $foo, say, "abcbca", and then I do:
$bar = $foo;
$foo .= "xyzyzx";
I see two ways of doing this: one is allowing a string value to be shared by
two or more variables, and the other one not.
If I allow sharing a string value between two variables, I see that $bar =
$foo only points $bar to the same string $foo is pointing. But when I do
$foo .= "xyzyzx", I cannot use the same buffer that $foo points, because I
don't know if the string pointed by $foo is shared or not (in this case I
can, but consider that the two statements are separated by many others, and
also sub calls; suppose $foo is actually a parameter to a sub that is passed
with its value already shared to another variable). Then I must allocate a
new buffer, copy the buffer of $foo, and do the `concat' operation with the
string "xyzyzx".
If I don't allow sharing a string value between two variables, $foo .=
"xyzyzx" could use $foo's buffer, if it's room in it, because guaranteedly
there will be no other variable pointing to it. However, $bar = $foo needs
to clone the string of $foo, since it cannot be shared between $foo and
$bar.
In this case there would be little problem in both cloning and in not
reusing the buffer, because the strings are short, but imagine $foo has
actually a 10KB string, or even something bigger... And imagine the = or .=
is actually being done in a loop that will potentially be run some hundreds
of times...
Given mark-and-sweep or other advanced GC, which of them is better? Sharing
the value or cloning on each assignment? (If there's no much difference, I'd
keep with sharing the value, since it seems simpler to implement and doesn't
require a clone operator, which can be messy).
I actually see two other approaches to this problem. One is using refcounted
GC. Why? Because if the refcounter is 1, I know this is the only object with
the buffer, no other object is looking at it, and I could use the same
buffer on .= without any problem at all.
Other way would be putting the burden on the compiler. If the compiler sees
something like
foreach $i (1..$big_loop) {
$foo .= ", $i";
}
it would clone $foo's value _before_ entering in the loop. This would assure
$foo's value is not shared with any other variable. This way, $foo's buffer
could be used to do the .= operation, having to create a new one only if
that one overflows.
There's actually a problem with reusing the buffer if the value of $foo has
the `concat' and `destroy' operations overloaded. If I'm reusing $foo's
buffer, on $foo .= "xyzyzx" I would call the (overloaded) concat operation
of $foo's value, telling it to use $foo's buffer if possible. This would
probably invalidade $foo's old value. But then I should call $foo's
`destroy', since $foo's old value is no longer valid, and the overloaded
action of $foo's value should take place. But `destroy' would encounter an
invalid value, or even would try to destroy the data in the new value of
$foo. And not calling `destroy' wouldn't be the right behaviour, since it
can do another kind of cleanup that should actually be necessary.
My call would be `not reusing values at all', but I'm quite unsure what
would be the performance of such a use. Maybe allow the compiler optimizing
version if $foo's value has a simple `destroy' that doesn't need to be
called if the buffer was reused... Or split `destroy' in two, one that would
get called always and the other only when the value is cleaned by the GC...
Any suggestions?
Thanks,
- Branden