Re: [ANN] Neanderthal, a fast, native matrix and linear algebra library for Clojure released + call for help

Dragan Djuric Mon, 22 Jun 2015 02:05:33 -0700


On Monday, June 22, 2015 at 2:02:19 AM UTC+2, Mikera wrote:
>
>
> There is nothing fundamentally wrong with BLAS/LAPACK, it just isn't 
> suitable as a general purpose array programming API. See my comments 
> further below.
>


I was discussing it from the *matrix API* perspective. My comments follow:
 

> If you think the core.matrix API is "unintuitive and complicated" then I'd 
> love to hear specific examples. We're still open to changing things before 
> we hit 1.0
>

I will only give a couple basic ones, but I think they draw a bigger 
picture. Let's say I am a Clojure programmer with no huge experience in 
numerical computing. I do have some knowledge about linear algebra and have 
a textbook or a paper with an algorithm that I need, which is based on some 
linear algebra operations. I'd say that this is the most common use case 
for an API such as core.matrix, and I hope you agree. After trying to write 
my own loops and recursion and fail to do it well, I shop around and find 
core.matrix with its cool proposal: a lot of numerical stuff in Clojure, 
with pluggable implementations. Yahooo! My problem is almost solved. Go to 
the main work right away:

1. I add it to my project and try the + example from the github page. It 
works.
2. Now I start implementing my algorithm. How to add-and-multiply a few 
matrices? THERE IS NO API DOC. I have to google and find 
https://github.com/mikera/core.matrix/wiki/Vectors-vs.-matrices so I guess 
it's mmul, but there is a lot of talk of some loosely related 
implementation details. Column matrixes, slices, ndarrays... What? A lot of 
implementation dependent info, almost no info on what I need (API).
3. I read the mailing list and the source code, and, if I manage to filter 
API information from a lot of implementation discussion I manage to draw a 
rough sketch of what I need (API).
4. I implement my algorithm with the default implementation (vectorz) and 
it works. I measure the performance, and as soon as the data size becomes a 
little more serious, it's too slow. No problem - pluggable implementations 
are here. Surely that Clatrix thing must be blazingly fast, it's native. I 
switch the implementations in no time, and get even poorer performance. 
WHAT?
5. I try to find help on the mailing list. I was using the implementation 
in a wrong way. WHY? It was all right with vectorz! Well, we didn't quite 
implemented it fully. A lot of functions are fallback. The implementation 
is not suitable for that particular call... Seriously? It's featured on the 
front page!
6. But, what is the right way to use it? I want to learn. THERE IS NO INFO. 
But, look at this, you can treat a Clojure vector as a quaternion and 
multiply it with a JSON hash-map, which is treated as a matrix of 
characters (OK, I am exaggerating, but not that much :)
etc, etc... 

But it certainly isn't "arbitrarily invented". Please note that we have 
> collectively considered a *lot* of previous work in the development of 
> core.matrix. People involved in the design have had experience with BLAS, 
> Fortran, NumPy, R, APL, numerous Java libraries, GPU acceleration, low 
> level assembly coding etc. We'd welcome your contributions too.... but I 
> hope you will first take the time to read the mailing list history etc. and 
> gain an appreciation for the design decisions.
>

I read lots of those discussions before. I may or may not agree with the 
written fully or partially, but I see that the result is far from what I 
find recommended in numerical computing literature that I read, and I do 
not see the core.matrix implementations show that literature wrong. 

 
>
>>
>> In my opinion, the best way to create a standard API is to grow it from 
>> successful implementations, instead of writing it first, and then 
>> shoehorning the implementations to fit it.
>>
>
> It is (comparatively) easy to write an API for a specific implementation 
> that supports a few specific operations and/or meets a specific use case. 
> The original Clatrix is an example of one such library.
>

Can you point me to some of the implementations where switching the 
implementation of an algorithm from vectorz to clatrix shows performance 
boost?
And, easy? Surely then the Clatrix implementation would be fully 
implemented and properly supported (and documented) after 2-3 years since 
it was included?
 

> But that soon falls apart when you realise that the API+implementation 
> doesn't meet  broader requirements, so you quickly get fragmentation e.g.
> - someone else creates a pure-JVM API for those who can't use native code 
> (e.g. vectorz-clj)
>

So, what is wrong with that? There are dozens of Clojure libraries for SQL, 
http, visualization, etc, and all have their place.
 

> - someone else produces a similar library with a new API that wins on some 
> benchmarks (e.g. Neanderthal)
>

I get your point, but would just note that Neanderthal wins *ALL* benchmark 
(that fit use cases that I need). Not because it is something too clever, 
but because it stands on the shoulders of giants (ATLAS).
 

> - someone else needs arrays that support non-numerical scalar types (e.g. 
> core.matrix NDArray)
> - a library becomes unmaintained and someone forks a replacement
> - someone wants to integrate a Java matrix library for legacy reasons
> - someone else has a bad case of NIH syndrome and creates a whole new 
> library
>

That could be said about virtually every application domain. Why is here 
many http, html, javascript, database APIs? Why don't have one API that 
could be used for any existing library? It's not that people didn't try. I 
prefer the microframework approach to a monolithic framework that has one 
true way.
 

>
> Before long you have a fragmented ecosystem with many libraries, many 
> different APIs and many annoyed / confused users who can't easily get their 
> tools to work together. Many of us have seen this happen before in other 
> contexts, and we don't want to see the same thing to happen for Clojure.
>

How many implementations of core.matrix work *WELL* together, for all 
supported use cases today?
 

> core.matrix solves the problem of library fragmentation by providing a 
> common abstract API, while allowing users choice over which underlying 
> implementation suits their particular needs best. To my knowledge Clojure 
> is the *only* language ecosystem that has developed such a capability, and 
> it has already proved extremely useful for many users. 
>

How many choices there are today that fully and properly implement 
core.matrix? 

So if you see people asking for Neanderthal to join the core.matrix 
> ecosystem, hopefully this helps to explain why.
>

As I explained that I would *LOVE* to be able to do such integration and 
benefit from it myself. But, I failed to see how to do it properly, and 
satisfy core.matrix goals (and my goals with Neanderthal at the same time).

Currently, I see core.matrix as a formula: idea + ? = success
I do not say that I do not like the idea generally. I would *LOVE* to see 
such thing. I do not see what is the "?" yet, and the current offering do 
not convince me that other people (core.matrix) can see it.
 

>
>> a) I would rather see the core.matrix interoperability as an additional 
>> separate project first, and when/if it shows its value, and there is a 
>> person willing to maintain that part of the code, consider adding it to 
>> Neanderthal. I wouldn't see it as a second rate, and no fork is needed 
>> because of Clojure's extend-type/extend-protocol mechanism. 
>>
>
>  

> While this could work from a technical perspective, I would encourage you 
> to integrate core.matrix support directly into Neanderthal, for at least 
> three reasons:
> a) It will allow you to save the effort of creating and maintaining a 
> whole duplicate API, when you can simply adopt the core.matrix API (for 
> many operations)
>

If I could do it simply, I would have already do that. I do not have to 
maintain a duplicate API now, though. I can maintain a simple Neanderthal 
API that I understand, which is based on BLAS/LAPACK, with lots of 
literature and know-how available online and offline, which does one thing 
and (in my opinion) does it well, and leave core.matrix integration for 
anyone that needs it.
 

> b) It will reduce maintenance, testing and deployment effort (for you and 
> for others)
>

If core.matrix was a good fit. However, I failed to see it that way by now.
 

> c) You are much more likely to get outside contributors if the library 
> forms a coherent whole and plays nicely with the rest of the ecosystem
>

That is true. However, I would rather have a library that fits well to my 
needs even if it attracts less people. And, I do not see how it is 
difficult to integrate Neanderthal with other libraries, since I used it 
with plotting libraries (clojure/java and external), and the integration 
was straightforward.
 

> This really isn't hard - in the first instance it is just a matter of 
> implementing a few core protocols. To get full performance, you would need 
> to implement more of the protocols, but that could be added over time.
>  
>
>> b) I am not sure about what's exactly "wrong" with core.matrix. Maybe 
>> nothing is wrong. The first thing that I am interested in is what do 
>> core.matrix team think is wrong with BLAS/LAPACK in the first place to be 
>> able to form an opinion in that regard
>>
>
> BLAS/LAPACK is a low level implementation. core.matrix is a higher level 
> abstraction of array programming. They simply aren't comparable in a 
> meaningful way. It's like comparing the HTTP protocol with the Apache web 
> server.
>

Here I have to disagree. BLAS/LAPACK is not a low level implementation. It 
is a *DE FACTO STANDARD* for numerical linear algebra for dense matrices: 
http://www.netlib.org/blas/blast-forum/blas-report.pdf. There are many 
implementations of that standard, and they set a really high mark. Besides 
atlas, there are Intel MKL, OpenBLAS, cuBLAS, clBLAS, and many other highly 
performant libraries. All implementing an API that's been crafted for 
decades and is as battle tested as a library could be.


> You could certainly use BLAS/LAPACK to create a core.matrix implementation 
> (which is roughly what Clatrix does, and what Neanderthal could do if it 
> became a core.matrix implementation). Performance of this implementation 
> should roughly match raw BLAS/LAPACK (all that core.matrix requires is the 
> protocol dispatch overhead, which is pretty minimal and only O(1) per 
> operation so it quickly becomes irrelevant for operations on large arrays).
>

Looking at the state of Clatrix integration, I have to disagree with that. 
Certainly anybody *COULD* program anything (hypoteticall) , but I'd stay 
more down to earth: what *IS*, and what are the reasons that it *IS NOT 
(yet)*. 
 

>
> In terms of API, core.matrix is *far* more powerful than BLAS/LAPACK. 
>

I do not agree. For *numerical linear algebra* it is not even close to 
BLAS/LAPACK. I agree BLAS/LAPACK is not a good date parser, and I am glad 
that it is not :)
 

> Some examples:
> - Support for arbitrary N-dimensional arrays (slicing, reshaping, 
> multi-dimensional transposes etc.)
>

And core.matrix is? Compared to NymPy? Compared to state of the art tensor 
libraries? (Torch, etc.)
 

> - General purpose array programming operations (analogous to NumPy and APL)
>

See previous.
 

> - Independence from underlying implementation. You can support pure-JVM 
> implementations (like vectorz-clj for example), native implementations, GPU 
> implementations.
>

That could be said for any API. Neanderthal have interfaces, which I think 
are simpler than core.matrix, and it also *can* support pure-JVM, native, 
GPU. The question is: which of those things *ARE* supported? Or maybe the 
answer to question "why they are not supported" would answer why I do not 
feel core.matrix is such wonderful solution for a generally noble goal.
 

> - Support for arbitrary scalar types (complex numbers? strings? dates? 
> quaternions anyone?)
>

Where is that support for complex numbers? (To remind you, BLAS/LAPACK 
already has it, although I didn't implement that part in Neanderthal yet, 
since I didn't need it).
 

> - Transparent support for both dense and sparse matrices with the same API
>

What is the performance of such operations? There is a reason dense/sparse 
APIs are different in numerical libraries.
 

> - Support for both mutable and immutable arrays
>

While I support such thing for the sake of effort in the name of elegance, 
in *numerical computing* mutable structures are what is important, and that 
is the first thing that any literature stresses first. 
 

> - Transparent support for the in-built Clojure data structures (Clojure 
> persistent vectors etc.)
>

Transparent is what worries me. Looks appealing, but shoots me in the foot. 
Again, in *numerical computing*, when I want to convert something, I want 
it to be explicit.
 

> - Support for mixing different array types
> - Supports for convenience operations such as broadcasting, coercion
>

See the previous point.
 

> If you build an API that supports all of that with a reasonably coherent 
> design... then you'll probably end up with something very similar to 
> core.matrix
>

On the other thing, I may think such an API an unnecessary complex and 
overblown thing. So, I guess that we just have a different perspective here.

Thank you for the great effort, though. Even if I do not see that it fits 
my needs, I am glad that it is a great tool for other people.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [ANN] Neanderthal, a fast, native matrix and linear algebra library for Clojure released + call for help

Reply via email to