So it's the mid-point of GSoC and a good time to review what's already done and what is yet to be done.
The main focus of first half of the summer was making NDArray work. NDArray is a new implementation of core.matrix protocols, intended to become the default one. It's implemented in pure Clojure as strided array over "flat" Java arrays, similar to NumPy's NDArray. Well, it works. Some highlights: - It works as as fully functional, N-dimensional array as part of core.matrix - It passes all current tests - A few bugs were fixed in core.matrix itself, uncovered during implementation testing. - An overview of implemented protocols is now generated from the code directly as html page and can be seen at [1]. - Some of the "default implementations" within core.matrix have been rewritten to be faster and use NDArray internally where it's appropriate. You can find examples of NDArray usage at [2]. An important issue in such libraries is performance. The first and most obvious step to win some is to use specialized primitive arrays instead of Object one; the trouble is that most of the code is duplicated many times. Clojure, with it's homoiconicity and ease of code juggling, is a big win here — after introduction of some "magic" main implementation of ndarray is almost completely free of boring repetition [3], despite generating code for 4 different types (and it's easily extensible). Then it's time for type hinting, little gory details of hidden type coercion, playing with memory access patterns. Matrix multiplication was a very important playground for me: it's fairly easy on the surface, well studied and easily checkable. Moreover, I have a reference point: Mike Anderson's Vectorz library [4]. After some time I've managed to beat it, at least at large matrices [5] (look at "ndarray-double"; "ndarray" now uses "default" implementation, which was based on persistent vectors in the past and became 2-3x faster after rewrite). There is plenty of room for improvement on small matrices, though. I'm writing a big post on this optimization experience, stay tuned :) What's next? A couple of interesting problems/tasks still await: - a lot of elementwise operations are very repetitive in code and are differentiated one from another only because of performance overhead of "map-like" dynamic solutions. They should really be a macro in case of NDArray; and what stops us from exposing this macro through an API so users can write very efficient custom elementwise operations? AFAIK, nothing. I'm very curious about how far this approach can be pushed. - for now there are some repetitive patterns with iterating index-wise over array and finding "true" index inside it by multiplicating indexes by strides. First benchmarks show that rewriting this loops to count "true" index, stepping by stride instead of by one, is beneficial to performance. An open question is whether it's possible to hide this in code generation without significantly hurting readability of code. - can we come up with better way to slice arrays in arbitrary ways (boolean vectors? boolean functions?) and in the same time make it efficient? - how hard it would be to port all this to CLJS and be efficient there too? Of course, there is also a huge pile of more mundane, but even more important work: documentation (will need to patch Marginalia, I suppose), tests (I'm very happy that simple-check [6] already exists, property-based testing is a giant leap forward, too often overlooked), infrastructure (currently I'm trying to find the best format to present performance test results at HTML page) and more optimization/protocol reimplementation. It has been a very interesting experience to day to work full-time on an open-source library. It was not so glossy and bleeding-edge, making the library work faster and more correct, but definitely very interesting. I've learned a lot about numerical programming and Clojure; I'm very pleased by a raise in my Clojure productivity during this project so far. It was also very enjoyable to work with my mentor, Mike Anderson. I'm looking forward to push core.matrix as far as I can during the time left. [1]: http://mikera.github.io/matrix-api/summary.html [2]: https://gist.github.com/si14/6131125 [3]: https://github.com/mikera/matrix-api/blob/develop/src/main/clojure/clojure/core/matrix/impl/ndarray.clj [4]: https://github.com/mikera/vectorz/ [5]: https://gist.github.com/si14/6127020 [6]: https://github.com/reiddraper/simple-check -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.