So it's the mid-point of GSoC and a good time to review what's already done 
and what is yet to be done.

The main focus of first half of the summer was making NDArray work. NDArray 
is a new implementation of core.matrix protocols, intended to become the 
default one. It's implemented in pure Clojure as strided array over "flat" 
Java arrays, similar to NumPy's NDArray.

Well, it works. Some highlights:

 - It works as as fully functional, N-dimensional array as part of 
core.matrix
 - It passes all current tests
 - A few bugs were fixed in core.matrix itself, uncovered during 
implementation testing.
 - An overview of implemented protocols is now generated from the code 
directly as html page and can be seen at [1].
 - Some of the "default implementations" within core.matrix have been 
rewritten to be faster and use NDArray internally where it's appropriate.

You can find examples of NDArray usage at [2].

An important issue in such libraries is performance. The first and most 
obvious step to win some is to use specialized primitive arrays instead of 
Object one; the trouble is that most of the code is duplicated many times. 
Clojure, with it's homoiconicity and ease of code juggling, is a big win 
here — after introduction of some "magic" main implementation of ndarray is 
almost completely free of boring repetition [3], despite generating code 
for 4 different types (and it's easily extensible).

Then it's time for type hinting, little gory details of hidden type 
coercion, playing with memory access patterns. Matrix multiplication was a 
very important playground for me: it's fairly easy on the surface, well 
studied and easily checkable. Moreover, I have a reference point: Mike 
Anderson's Vectorz library [4]. After some time I've managed to beat it, at 
least at large matrices [5] (look at "ndarray-double"; "ndarray" now uses 
"default" implementation, which was based on persistent vectors in the past 
and became 2-3x faster after rewrite). There is plenty of room for 
improvement on small matrices, though. I'm writing a big post on this 
optimization experience, stay tuned :)

What's next? A couple of interesting problems/tasks still await:

- a lot of elementwise operations are very repetitive in code and are 
differentiated one from another only because of performance overhead of 
"map-like" dynamic solutions. They should really be a macro in case of 
NDArray; and what stops us from exposing this macro through an API so users 
can write very efficient custom elementwise operations? AFAIK, nothing. I'm 
very curious about how far this approach can be pushed.
- for now there are some repetitive patterns with iterating index-wise over 
array and finding "true" index inside it by multiplicating indexes by 
strides. First benchmarks show that rewriting this loops to count "true" 
index, stepping by stride instead of by one, is beneficial to performance. 
An open question is whether it's possible to hide this in code generation 
without significantly hurting readability of code.
- can we come up with better way to slice arrays in arbitrary ways (boolean 
vectors? boolean functions?) and in the same time make it efficient?
- how hard it would be to port all this to CLJS and be efficient there too?

Of course, there is also a huge pile of more mundane, but even more 
important work: documentation (will need to patch Marginalia, I suppose), 
tests (I'm very happy that simple-check [6] already exists, property-based 
testing is a giant leap forward, too often overlooked), infrastructure 
(currently I'm trying to find the best format to present performance test 
results at HTML page) and more optimization/protocol reimplementation.

It has been a very interesting experience to day to work full-time on an 
open-source library. It was not so glossy and bleeding-edge, making the 
library work faster and more correct, but definitely very interesting. I've 
learned a lot about numerical programming and Clojure; I'm very pleased by 
a raise in my Clojure productivity during this project so far. It was also 
very enjoyable to work with my mentor, Mike Anderson. I'm looking forward 
to push core.matrix as far as I can during the time left.

[1]: http://mikera.github.io/matrix-api/summary.html
[2]: https://gist.github.com/si14/6131125
[3]: 
https://github.com/mikera/matrix-api/blob/develop/src/main/clojure/clojure/core/matrix/impl/ndarray.clj
[4]: https://github.com/mikera/vectorz/
[5]: https://gist.github.com/si14/6127020
[6]: https://github.com/reiddraper/simple-check

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to