Hi,
generally speaking unnecessary clone() of values should of course be
avoided.
In GNU APL 1.0 and 1.1 there was a flag-based system of value ownership
where the last owner
would delete the value when giving up its interest in the value. This
system began like the
tmp flag, but then caused stale values on one hand and segfaults for
values released too early on the
other hand all over the place.
I then changed to Value_P which is pretty much a shared pointer with
reference counting. Initially
I tried the standard shared_ptr<> class of C++ but that caused some
compiler/portability issues.
That change has brought a lot of stability into GNU APL,
As you have noticed yourself, going through the code and changing is all
over the place is bad and
error-prone, and it would kind of undo the change from the flag-system
to Value_P (which was also
a huge effort as you may figure from SVN).
So the only solution remaining is rewriting the Value (or. more likely,
the Value_P) class.
Now, the Value class knows how many owners it has (the share_count below
already exists, it is called owner_count).
But it does not know who they are. If someone does X[Y]←Z then symbol X
would need to clone its current value
when Value::index(Z) os performed, but Value::index(Z) does not know
anything about Symbol X and other owners
of the value,
I will send you the 170+ GNU APL testcases so that you can quickly check
if some change you make is breaking
something else.
Having several Values pointing to the same ravel is also a bit obscure
because the Cells of the ravel need to be
released (sub values and complex numbers need to be deleted) which is
almost impossible if you have nested
ravels or ravels of of different sizes.
I also doubt that you can gain 3-4 orders of magnitude because a value
is normally only cloned very few times.
That does not mean that you can't prove otherwise, I can speed up +/⍳N
by 6 orders of magnitude but that does
not prove that this is a valuable optimization,
/// Jürgen
On 04/28/2014 02:57 PM, Elias Mårtenson wrote:
Hello Jürgen,
I don't know if you have given this issue any thought, but it has
certainly occupied my mind for the last few days.
It's clear that heavy array processing does far too much cloning than
should be necessary. Especially in cases where you have lots of
operations on smaller arrays (as opposed to few operations on large
arrays).
This is because the code always performs a clone prior to a
destructive operation because at that point it can never be sure that
the array will not be used again. The (very few) exceptions to this is
handled by the "temp" system, which is really only used in the ravel
function.
The way I see it, there are two approaches to solving this: The first
one being to go through the code with a fine-toothed comb and
implement the temp system everywhere. This is lots of work and is
error-prone.
The other solution would be to re-engineer the Value class so that it
can share underlying data with other Value instances. The Value class
would then get a counter called share_count or something, indicating
how many other references there are to the same data. When clone() is
called, it will simply create a new Value instance, share the
underlying data and increase share_count. Any destructive operations
would only copy the content if it's not already shared.
/Now, Value_P already implements some of these semantics. Could it be
reused for this?/
/
/
While appealing, this copy-on-write solution might not be perfect
though. The assumption is that the caller would have decremented the
share count (effectively "releasing" the Value) before the called
function tries to modify it. Now, this "release" is similar to the the
"temp" marking. Could there be a better way?
I've been experimenting with this on and off, and my interim results
(as I've mentioned before) show that the potential performance boosts
are massive. We're talking about 3-4 orders of magnitude. Definitely
worth quite a lot of effort, IMHO.
I'm likely going to continue pursuing this, because I personally feel
frustrated when I do something and it's not instantaneous, knowing
that I'm waiting for the interpreter to perform unnecessary
operations. :-)
What are your thoughts on this? What would be the best approach?
Regards,
Elias