Seems to work, and the performance improvements for the functions that have been optimised is massive. Thanks a lot!
I wonder what other functions would provide the biggest benefit when optimised in this fashion. I never ran any benchmarks on the optimised version of +, because it kept crashing on me. Regards, Elias On 15 February 2014 03:18, Juergen Sauermann <juergen.sauerm...@t-online.de>wrote: > Hi, > > I have added the proposed optimisation, SVN 123. > Hope it does not break anything. > > /// Jürgen > > > > > On 02/02/2014 03:35 PM, Frederick H. Pitts wrote: > >> Elias, >> >> This approach to optimizating Gnu APL sounds very reasonable to >> me. I >> hope it succeeds. May Conway's "Game of Life" go faster. :-) >> >> Fred >> >> On Sun, 2014-02-02 at 21:19 +0800, Elias Mårtenson wrote: >> >>> I made a test implementation of a performance optimisation for GNU >>> APL. The short story is that it seems to work for limited tests, but >>> there may be issues with my approach and I don't want to spend any >>> more time on it unless Jürgen agrees it's a good idea. After all, I >>> may have completely misunderstood the architecture of GNU APL and my >>> approach may be fundamentally broken. >>> >>> >>> When doing some benchmarking I noticed a lot of time was spent >>> allocating, initialising and freeing Value instances. I realised that >>> many such allocations are not actually needed. Take the following >>> expression for example: >>> >>> >>> ,1+2×foo >>> >>> >>> In this case, the following things happen: >>> 1. The multiplication function (implemented by eval_skalar_AB) >>> creates a new array and fills it with the result of the >>> operation (in this case, bif_multiply) on each cell. >>> 2. The addition function is then called, resulting in an >>> identical flow as in the multiplication above. >>> 3. The ravel function is then called (Bif_COMMA::ravel), >>> resulting in yet another array being created which is simply >>> copied from B. Note that the ravel is identical between Z and >>> B in this case. >>> From a performance perspective, I saw a particular low-hanging fruit >>> in the ravel call in the last point. Since the value that came from >>> the addition is never used anywhere else, one can simply change the >>> shape of B in-place and return it immediately. >>> >>> >>> My experiment involved creating a new VF (value flag) value; VF_temp >>> that indicates that the value may be reused. This flag is then set for >>> all values that are only used once (for example, the return value from >>> a primitive or user function). If a value needs to be preserved, it is >>> copied, and the flag is not set. >>> >>> >>> What all of this means is that values that are returned from a >>> function, can be reused if it makes sense. There are several cases >>> where this is the case. For example: >>> * The comma function (as discussed above) >>> * Reshape, especially when the shape of Z matches that of B >>> * Scalar functions. If the sizes of the arguments of plus, for >>> example, matches, then A or B can be reused if they are marked >>> as temporary. >>> * The each operator can (maybe?) push the result straight into >>> the source array. >>> I've only done very basic testing, but the ravel operator on a large >>> array went from taking a second or so to pretty much zero. Same should >>> be true for reshape, although that one crashes for me right now. :-) >>> Scalar functions still have to process the elements, so their timing >>> can't go to zero, but at least the should save the time spent setting >>> up the result array and creating new values. >>> >>> >>> Please let me know what you think. >>> >>> >>> Elias >>> >> >> >> >> >