Seems to work, and the performance improvements for the functions that have
been optimised is massive. Thanks a lot!

I wonder what other functions would provide the biggest benefit when
optimised in this fashion. I never ran any benchmarks on the optimised
version of +, because it kept crashing on me.

Regards,
Elias


On 15 February 2014 03:18, Juergen Sauermann
<juergen.sauerm...@t-online.de>wrote:

> Hi,
>
> I have added the proposed optimisation, SVN 123.
> Hope it does not break anything.
>
> /// Jürgen
>
>
>
>
> On 02/02/2014 03:35 PM, Frederick H. Pitts wrote:
>
>> Elias,
>>
>>         This approach to optimizating Gnu APL sounds very reasonable to
>> me.  I
>> hope it succeeds.  May Conway's "Game of Life" go faster. :-)
>>
>> Fred
>>
>> On Sun, 2014-02-02 at 21:19 +0800, Elias Mårtenson wrote:
>>
>>> I made a test implementation of a performance optimisation for GNU
>>> APL. The short story is that it seems to work for limited tests, but
>>> there may be issues with my approach and I don't want to spend any
>>> more time on it unless Jürgen agrees it's a good idea. After all, I
>>> may have completely misunderstood the architecture of GNU APL and my
>>> approach may be fundamentally broken.
>>>
>>>
>>> When doing some benchmarking I noticed a lot of time was spent
>>> allocating, initialising and freeing Value instances. I realised that
>>> many such allocations are not actually needed. Take the following
>>> expression for example:
>>>
>>>
>>>          ,1+2×foo
>>>
>>>
>>> In this case, the following things happen:
>>>       1. The multiplication function (implemented by eval_skalar_AB)
>>>          creates a new array and fills it with the result of the
>>>          operation (in this case, bif_multiply) on each cell.
>>>       2. The addition function is then called, resulting in an
>>>          identical flow as in the multiplication above.
>>>       3. The ravel function is then called (Bif_COMMA::ravel),
>>>          resulting in yet another array being created which is simply
>>>          copied from B. Note that the ravel is identical between Z and
>>>          B in this case.
>>>  From a performance perspective, I saw a particular low-hanging fruit
>>> in the ravel call in the last point. Since the value that came from
>>> the addition is never used anywhere else, one can simply change the
>>> shape of B in-place and return it immediately.
>>>
>>>
>>> My experiment involved creating a new VF (value flag) value; VF_temp
>>> that indicates that the value may be reused. This flag is then set for
>>> all values that are only used once (for example, the return value from
>>> a primitive or user function). If a value needs to be preserved, it is
>>> copied, and the flag is not set.
>>>
>>>
>>> What all of this means is that values that are returned from a
>>> function, can be reused if it makes sense. There are several cases
>>> where this is the case. For example:
>>>        * The comma function (as discussed above)
>>>        * Reshape, especially when the shape of Z matches that of B
>>>        * Scalar functions. If the sizes of the arguments of plus, for
>>>          example, matches, then A or B can be reused if they are marked
>>>          as temporary.
>>>        * The each operator can (maybe?) push the result straight into
>>>          the source array.
>>> I've only done very basic testing, but the ravel operator on a large
>>> array went from taking a second or so to pretty much zero. Same should
>>> be true for reshape, although that one crashes for me right now. :-)
>>>   Scalar functions still have to process the elements, so their timing
>>> can't go to zero, but at least the should save the time spent setting
>>> up the result array and creating new values.
>>>
>>>
>>> Please let me know what you think.
>>>
>>>
>>> Elias
>>>
>>
>>
>>
>>
>

Reply via email to