Also on the issue of loop unrolling and efficient looping.
PDL has what we call 'threading'.
This allows a C-level function to specify the dimensionality of
the arguments it accepts. For example a function addtoline() which
hyptheticaly adds a constant to a row vector might
have a 'signature'
a(n); b(); [o]c(n);
So a(n) is a 1D input, b() is a scalar (0D) which addtoline might add
to all the elements of a(n) and c(n) is the output 1D.
What is really useful is that if you add extra dimensions they
get looped over automatically, at the C-level so really fast.
e.g. a(100,10), b(10), c(100,10)
- adds to all 10 rows of a.
ALSO they way it is implemented is the array pointers are calculated
by C macros in such a way as to support transpositions and slicing
with zero memory overhead. Thus if I want to add one to every *column*
of a, in a slice:
addtoline $a->xchg(0,1)->slice("10:20,20:40"), 10, $c
here xchg creates a virtual transposition of the first two dims, and slice
creates a virtual slice. This is all done by storing extra info in the
$a object.
I think these ideas would be of use in any discussion of perl6 numerical
efficiency - there are other ways I guess. The core idea is to try and stay
in compiled loops.
The other advantage of this 'threading' is that it then automatically parallelizes
the problem - we even have an experimental PDL implementation which can use
multiple CPUs to do 'threads'.
One problem we are continually faced in PDL is we do all this at the C-level
- but then we run into problems where if we have pure-perl PDL functions they
can't do these tricks.
Another problem though is while one can usually write many complicated multi-dim
problems with threading tricks, and avoid loops, it is sometimes a bit taxing
on the brain! One often wishes one could just write it as C/fortran style
loops and have the language figure out how to do the loops efficiently.
Anyway some integration of concepts for handling large numerical computation
into the core would definitely be a good thing.
Karl Glazebrook