If you want to look at the code, all arrays are implemented as instances of
Value. You will see references to Value_P as well, which is just a
refcounted pointer to a Value. A Value has a Shape which defines its
dimensions.

Code that processes array content does so using the method .get_ravel() and
its various overloads.

I'm thinking that perhaps it would be possible to create an optimised
primitive array by subclassing (or rather, extracting a common superclass)
Value into something that can handle certain arrays much faster. Where that
is actually faster depends on where the time is actually spent. If you
still have to waste time boxing the values returned from .get_ravel() then
there is little point.

The first step is to run your tests through cachegrind to determine exactly
where the bottlenecks lie.

I have done a cachegrind analysis before, and I determined that most of the
time was actually spent copying the arrays. The problem is that in an
expression such as A←A+1 where A is a large array, the entire content is
copied, 1 is added to each element. However, since A is overwritten in the
assignment a lot of time can be saved if the addition is done in-place.
With an expression such as A←⍉1+2×A you end up with three unnecessary
copies and initialisations of the entire array. The time spent here is much
larger than the time taken to actually perform the mathematical operations.
I even implemented an optimisation for this which was rolled back because
it was severely broken. However, when it worked it provided an order of
magnitude performance improvement or more.

You might want to read up in the mailing list archives from a year or so
ago where all of this was discussed in depth.

Regards,
Elias

On 21 August 2015 at 12:49, Mike Duvos <m...@wolf359.net> wrote:

> Hi Elias,
>
> It seems that there would be a substantial performance hit using C++
> object management to construct your workspace.  I haven't run any
> benchmarks before, but perhaps a quick comparison with APL2 would be useful
> at this point.
>
> Let's make a little function that does something 100 times.
>
>       ∇DO100[⎕]∇
>     ∇
> [0]   DO100 X;I
> [1]   I←¯1
> [2]  L1:→((I←I+1)≥100)/0
> [3]   ⍎X
> [4]   →L1
>     ∇ 2015-08-20 21.00.29.850 (GMT-7)
>
> And another little function that prints out the number of seconds
> something takes.
>
>       ∇TIME[⎕]∇
>     ∇
> [0]   TIME X;TS
> [1]   TS←⎕TS
> [2]   ⍎X
> [3]   (⍕(24 60 60 1000⊥¯4↑⎕TS-TS)÷1000),' Seconds.'
>     ∇ 2015-08-20 21.01.09.470 (GMT-7)
>
> And an array of a million floats.
>
>       A←1000 1000⍴⍳1E6
>
> And a 100x100 boolean identity matrix
>
>       B←100 100⍴101↑1=1
>
> [IBM APL2]
>
>       TIME  'DO100 ''A←⍉A'''
> 2.391 Seconds.
>
>       TIME 'DO100 ''B←B∨.∧B'''
> 7.591 Seconds.
>
>       TIME 'DO100 ''C←≠\B'''
> 0.197 Seconds.
>
> [GNU APL]
>
>       TIME    'DO100 ''A←⍉A'''
> 54.987 Seconds.
>
>       TIME 'DO100 ''B←B∨.∧B'''
> 13.325  Seconds
>
>       TIME 'DO100 ''C←≠\B'''
> 59.361 Seconds.
>
> That's actually not too horrible, considering what it's having to do.
>
> Regards,
>
> Mike
>
>
>

Reply via email to