Re: [ANN] Neanderthal, a fast, native matrix and linear algebra library for Clojure released + call for help

Mikera Mon, 22 Jun 2015 04:57:00 -0700

Hi Dragan,

The situation as I see it:
- You've created a matrix library that performs well on one benchmark 
(dense matrix multiplication). 
- Neanderthal meets your own personal use cases. Great job!
- Neanderthal *doesn't* fit the use cases of many others (e.g. some need a 
portable pure JVM implementation, so Neanderthal is immediately out)
- Fortunately, in the Clojure world we have a unique way for such libraries 
to interoperate smoothly with a common API (core.matrix)
- Neanderthal could fit nicely in this ecosystem (possibly it could even 
replace Clatrix, which as you note hasn't really been maintained for a 
while...)
- For some strange reason, it *appears to me* that you don't want to 
collaborate. If I perceive wrongly, then I apologise.


If you want to work together with the rest of the community, that's great. 
I'm personally happy to help you make Neanderthal into a great matrix 
implementation that works well with core.matrix. I'm 100% sure that is an 
relatively simple and achievable goal, having done it already with 
vectorz-clj 

If on the other hand your intention is to go your own way and build 
something that is totally independent and incompatible, that is of course 
your right but I think that's a really bad idea and would be detrimental to 
the community as a whole. Fragmentation is a likely result. At worst, 
you'll be stuck maintaining a library with virtually no users (the Clojure 
community is fairly small anyway... and it is pretty lonely to be a 
minority within a minority)

I can see from your comments below that you still don't understand 
core.matrix. I'd be happy to help clarify if you are seriously interested 
in being part of the ecosystem. Ultimately I think you have some talent, 
you have obviously put in a decent amount of work and Neanderthal could be 
a great library *if and only if* it works well with the rest of the 
ecosystem and you are personally willing to collaborate. 

Your call.

On Monday, 22 June 2015 10:05:15 UTC+1, Dragan Djuric wrote:
>
>
>
> On Monday, June 22, 2015 at 2:02:19 AM UTC+2, Mikera wrote:
>>
>>
>> There is nothing fundamentally wrong with BLAS/LAPACK, it just isn't 
>> suitable as a general purpose array programming API. See my comments 
>> further below.
>>
>
> I was discussing it from the *matrix API* perspective. My comments follow:
>  
>
>> If you think the core.matrix API is "unintuitive and complicated" then 
>> I'd love to hear specific examples. We're still open to changing things 
>> before we hit 1.0
>>
>
> I will only give a couple basic ones, but I think they draw a bigger 
> picture. Let's say I am a Clojure programmer with no huge experience in 
> numerical computing. I do have some knowledge about linear algebra and have 
> a textbook or a paper with an algorithm that I need, which is based on some 
> linear algebra operations. I'd say that this is the most common use case 
> for an API such as core.matrix, and I hope you agree. After trying to write 
> my own loops and recursion and fail to do it well, I shop around and find 
> core.matrix with its cool proposal: a lot of numerical stuff in Clojure, 
> with pluggable implementations. Yahooo! My problem is almost solved. Go to 
> the main work right away:
>
> 1. I add it to my project and try the + example from the github page. It 
> works.
> 2. Now I start implementing my algorithm. How to add-and-multiply a few 
> matrices? THERE IS NO API DOC. I have to google and find 
> https://github.com/mikera/core.matrix/wiki/Vectors-vs.-matrices so I 
> guess it's mmul, but there is a lot of talk of some loosely related 
> implementation details. Column matrixes, slices, ndarrays... What? A lot of 
> implementation dependent info, almost no info on what I need (API).
> 3. I read the mailing list and the source code, and, if I manage to filter 
> API information from a lot of implementation discussion I manage to draw a 
> rough sketch of what I need (API).
> 4. I implement my algorithm with the default implementation (vectorz) and 
> it works. I measure the performance, and as soon as the data size becomes a 
> little more serious, it's too slow. No problem - pluggable implementations 
> are here. Surely that Clatrix thing must be blazingly fast, it's native. I 
> switch the implementations in no time, and get even poorer performance. 
> WHAT?
> 5. I try to find help on the mailing list. I was using the implementation 
> in a wrong way. WHY? It was all right with vectorz! Well, we didn't quite 
> implemented it fully. A lot of functions are fallback. The implementation 
> is not suitable for that particular call... Seriously? It's featured on the 
> front page!
> 6. But, what is the right way to use it? I want to learn. THERE IS NO 
> INFO. But, look at this, you can treat a Clojure vector as a quaternion and 
> multiply it with a JSON hash-map, which is treated as a matrix of 
> characters (OK, I am exaggerating, but not that much :)
> etc, etc... 
>  
>
But it certainly isn't "arbitrarily invented". Please note that we have 
>> collectively considered a *lot* of previous work in the development of 
>> core.matrix. People involved in the design have had experience with BLAS, 
>> Fortran, NumPy, R, APL, numerous Java libraries, GPU acceleration, low 
>> level assembly coding etc. We'd welcome your contributions too.... but I 
>> hope you will first take the time to read the mailing list history etc. and 
>> gain an appreciation for the design decisions.
>>
>
> I read lots of those discussions before. I may or may not agree with the 
> written fully or partially, but I see that the result is far from what I 
> find recommended in numerical computing literature that I read, and I do 
> not see the core.matrix implementations show that literature wrong. 
>
>  
>>
>>>
>>> In my opinion, the best way to create a standard API is to grow it from 
>>> successful implementations, instead of writing it first, and then 
>>> shoehorning the implementations to fit it.
>>>
>>
>> It is (comparatively) easy to write an API for a specific implementation 
>> that supports a few specific operations and/or meets a specific use case. 
>> The original Clatrix is an example of one such library.
>>
>
> Can you point me to some of the implementations where switching the 
> implementation of an algorithm from vectorz to clatrix shows performance 
> boost?
> And, easy? Surely then the Clatrix implementation would be fully 
> implemented and properly supported (and documented) after 2-3 years since 
> it was included?
>  
>
>> But that soon falls apart when you realise that the API+implementation 
>> doesn't meet  broader requirements, so you quickly get fragmentation e.g.
>> - someone else creates a pure-JVM API for those who can't use native code 
>> (e.g. vectorz-clj)
>>
>
> So, what is wrong with that? There are dozens of Clojure libraries for 
> SQL, http, visualization, etc, and all have their place.
>  
>
>> - someone else produces a similar library with a new API that wins on 
>> some benchmarks (e.g. Neanderthal)
>>
>
> I get your point, but would just note that Neanderthal wins *ALL* 
> benchmark (that fit use cases that I need). Not because it is something too 
> clever, but because it stands on the shoulders of giants (ATLAS).
>  
>
>> - someone else needs arrays that support non-numerical scalar types (e.g. 
>> core.matrix NDArray)
>> - a library becomes unmaintained and someone forks a replacement
>> - someone wants to integrate a Java matrix library for legacy reasons
>> - someone else has a bad case of NIH syndrome and creates a whole new 
>> library
>>
>
> That could be said about virtually every application domain. Why is here 
> many http, html, javascript, database APIs? Why don't have one API that 
> could be used for any existing library? It's not that people didn't try. I 
> prefer the microframework approach to a monolithic framework that has one 
> true way.
>  
>
>>
>> Before long you have a fragmented ecosystem with many libraries, many 
>> different APIs and many annoyed / confused users who can't easily get their 
>> tools to work together. Many of us have seen this happen before in other 
>> contexts, and we don't want to see the same thing to happen for Clojure.
>>
>
> How many implementations of core.matrix work *WELL* together, for all 
> supported use cases today?
>  
>
>> core.matrix solves the problem of library fragmentation by providing a 
>> common abstract API, while allowing users choice over which underlying 
>> implementation suits their particular needs best. To my knowledge Clojure 
>> is the *only* language ecosystem that has developed such a capability, and 
>> it has already proved extremely useful for many users. 
>>
>
> How many choices there are today that fully and properly implement 
> core.matrix? 
>
> So if you see people asking for Neanderthal to join the core.matrix 
>> ecosystem, hopefully this helps to explain why.
>>
>
> As I explained that I would *LOVE* to be able to do such integration and 
> benefit from it myself. But, I failed to see how to do it properly, and 
> satisfy core.matrix goals (and my goals with Neanderthal at the same time).
>
> Currently, I see core.matrix as a formula: idea + ? = success
> I do not say that I do not like the idea generally. I would *LOVE* to see 
> such thing. I do not see what is the "?" yet, and the current offering do 
> not convince me that other people (core.matrix) can see it.
>  
>
>>
>>> a) I would rather see the core.matrix interoperability as an additional 
>>> separate project first, and when/if it shows its value, and there is a 
>>> person willing to maintain that part of the code, consider adding it to 
>>> Neanderthal. I wouldn't see it as a second rate, and no fork is needed 
>>> because of Clojure's extend-type/extend-protocol mechanism. 
>>>
>>
>>  
>
>> While this could work from a technical perspective, I would encourage you 
>> to integrate core.matrix support directly into Neanderthal, for at least 
>> three reasons:
>> a) It will allow you to save the effort of creating and maintaining a 
>> whole duplicate API, when you can simply adopt the core.matrix API (for 
>> many operations)
>>
>
> If I could do it simply, I would have already do that. I do not have to 
> maintain a duplicate API now, though. I can maintain a simple Neanderthal 
> API that I understand, which is based on BLAS/LAPACK, with lots of 
> literature and know-how available online and offline, which does one thing 
> and (in my opinion) does it well, and leave core.matrix integration for 
> anyone that needs it.
>  
>
>> b) It will reduce maintenance, testing and deployment effort (for you and 
>> for others)
>>
>
> If core.matrix was a good fit. However, I failed to see it that way by now.
>  
>
>> c) You are much more likely to get outside contributors if the library 
>> forms a coherent whole and plays nicely with the rest of the ecosystem
>>
>
> That is true. However, I would rather have a library that fits well to my 
> needs even if it attracts less people. And, I do not see how it is 
> difficult to integrate Neanderthal with other libraries, since I used it 
> with plotting libraries (clojure/java and external), and the integration 
> was straightforward.
>  
>
>> This really isn't hard - in the first instance it is just a matter of 
>> implementing a few core protocols. To get full performance, you would need 
>> to implement more of the protocols, but that could be added over time.
>>  
>>
>>> b) I am not sure about what's exactly "wrong" with core.matrix. Maybe 
>>> nothing is wrong. The first thing that I am interested in is what do 
>>> core.matrix team think is wrong with BLAS/LAPACK in the first place to be 
>>> able to form an opinion in that regard
>>>
>>
>> BLAS/LAPACK is a low level implementation. core.matrix is a higher level 
>> abstraction of array programming. They simply aren't comparable in a 
>> meaningful way. It's like comparing the HTTP protocol with the Apache web 
>> server.
>>
>
> Here I have to disagree. BLAS/LAPACK is not a low level implementation. It 
> is a *DE FACTO STANDARD* for numerical linear algebra for dense matrices: 
> http://www.netlib.org/blas/blast-forum/blas-report.pdf. There are many 
> implementations of that standard, and they set a really high mark. Besides 
> atlas, there are Intel MKL, OpenBLAS, cuBLAS, clBLAS, and many other highly 
> performant libraries. All implementing an API that's been crafted for 
> decades and is as battle tested as a library could be.
>
>
>> You could certainly use BLAS/LAPACK to create a core.matrix 
>> implementation (which is roughly what Clatrix does, and what Neanderthal 
>> could do if it became a core.matrix implementation). Performance of this 
>> implementation should roughly match raw BLAS/LAPACK (all that core.matrix 
>> requires is the protocol dispatch overhead, which is pretty minimal and 
>> only O(1) per operation so it quickly becomes irrelevant for operations on 
>> large arrays).
>>
>
> Looking at the state of Clatrix integration, I have to disagree with that. 
> Certainly anybody *COULD* program anything (hypoteticall) , but I'd stay 
> more down to earth: what *IS*, and what are the reasons that it *IS NOT 
> (yet)*. 
>  
>
>>
>> In terms of API, core.matrix is *far* more powerful than BLAS/LAPACK. 
>>
>
> I do not agree. For *numerical linear algebra* it is not even close to 
> BLAS/LAPACK. I agree BLAS/LAPACK is not a good date parser, and I am glad 
> that it is not :)
>  
>
>> Some examples:
>> - Support for arbitrary N-dimensional arrays (slicing, reshaping, 
>> multi-dimensional transposes etc.)
>>
>
> And core.matrix is? Compared to NymPy? Compared to state of the art tensor 
> libraries? (Torch, etc.)
>  
>
>> - General purpose array programming operations (analogous to NumPy and 
>> APL)
>>
>
> See previous.
>  
>
>> - Independence from underlying implementation. You can support pure-JVM 
>> implementations (like vectorz-clj for example), native implementations, GPU 
>> implementations.
>>
>
> That could be said for any API. Neanderthal have interfaces, which I think 
> are simpler than core.matrix, and it also *can* support pure-JVM, native, 
> GPU. The question is: which of those things *ARE* supported? Or maybe the 
> answer to question "why they are not supported" would answer why I do not 
> feel core.matrix is such wonderful solution for a generally noble goal.
>  
>
>> - Support for arbitrary scalar types (complex numbers? strings? dates? 
>> quaternions anyone?)
>>
>
> Where is that support for complex numbers? (To remind you, BLAS/LAPACK 
> already has it, although I didn't implement that part in Neanderthal yet, 
> since I didn't need it).
>  
>
>> - Transparent support for both dense and sparse matrices with the same API
>>
>
> What is the performance of such operations? There is a reason dense/sparse 
> APIs are different in numerical libraries.
>  
>
>> - Support for both mutable and immutable arrays
>>
>
> While I support such thing for the sake of effort in the name of elegance, 
> in *numerical computing* mutable structures are what is important, and that 
> is the first thing that any literature stresses first. 
>  
>
>> - Transparent support for the in-built Clojure data structures (Clojure 
>> persistent vectors etc.)
>>
>
> Transparent is what worries me. Looks appealing, but shoots me in the 
> foot. Again, in *numerical computing*, when I want to convert something, I 
> want it to be explicit.
>  
>
>> - Support for mixing different array types
>> - Supports for convenience operations such as broadcasting, coercion
>>
>
> See the previous point.
>  
>
>> If you build an API that supports all of that with a reasonably coherent 
>> design... then you'll probably end up with something very similar to 
>> core.matrix
>>
>
> On the other thing, I may think such an API an unnecessary complex and 
> overblown thing. So, I guess that we just have a different perspective here.
>
> Thank you for the great effort, though. Even if I do not see that it fits 
> my needs, I am glad that it is a great tool for other people.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [ANN] Neanderthal, a fast, native matrix and linear algebra library for Clojure released + call for help

Reply via email to