Re: [ANN] Neanderthal, a fast, native matrix and linear algebra library for Clojure released + call for help

Mikera Sun, 21 Jun 2015 17:02:38 -0700

On Saturday, 20 June 2015 08:43:39 UTC+1, Dragan Djuric wrote:
>
> On Friday, June 19, 2015 at 11:17:02 PM UTC+2, Christopher Small wrote:
>>
>> I see now Dragan; you're concerned not about whether easily implementing 
>> and swapping in/out implementations of core.matrix is possible, but whether 
>> it can be done while maintaining the performance characteristics of 
>> Neanderthal, yes? That did not come through in your earlier comments in 
>> this thread.
>>
>
> This, with the addition that for *any* library, not only Neanderthal, 
> there would be many leaking abstractions. It is easy to define common 
> function/method names and parameters, but there are many things that just 
> flow through the API regardless, and taming this is the hardest part of any 
> API.
>
 
>
>>
>> Certainly, performance is one of those things that can leak in an 
>> abstraction. But I'd like to echo Matt's enquiry: If you think a unified 
>> API might be possible but that core.matrix isn't it, I think we'd all love 
>> to hear what you think it's missing and/or what would would need to be 
>> rearchitected in order for it to fit the bill.
>>
>
> For an unified API, if it is at all feasible, I think there is one place 
> it should be looked at first: BLAS 1, 2, 3 and LAPACK. This is THE de facto 
> standard for matrix computations for dense and banded matrices. Sparse APIs 
> are not that uniform, bat in that space, also, there is a lot of previous 
> work. So, what's wrong with BLAS/LAPACK that core.matrix choose not to 
> follow it and arbitrarily invent (in my opinion) unintuitive and 
> complicated API? I am genuinely interested, maybe I don't see something 
> that other people do. 
>


There is nothing fundamentally wrong with BLAS/LAPACK, it just isn't 
suitable as a general purpose array programming API. See my comments 
further below.

If you think the core.matrix API is "unintuitive and complicated" then I'd 
love to hear specific examples. We're still open to changing things before 
we hit 1.0

But it certainly isn't "arbitrarily invented". Please note that we have 
collectively considered a *lot* of previous work in the development of 
core.matrix. People involved in the design have had experience with BLAS, 
Fortran, NumPy, R, APL, numerous Java libraries, GPU acceleration, low 
level assembly coding etc. We'd welcome your contributions too.... but I 
hope you will first take the time to read the mailing list history etc. and 
gain an appreciation for the design decisions.

 

>
> In my opinion, the best way to create a standard API is to grow it from 
> successful implementations, instead of writing it first, and then 
> shoehorning the implementations to fit it.
>

It is (comparatively) easy to write an API for a specific implementation 
that supports a few specific operations and/or meets a specific use case. 
The original Clatrix is an example of one such library.

But that soon falls apart when you realise that the API+implementation 
doesn't meet  broader requirements, so you quickly get fragmentation e.g.
- someone else creates a pure-JVM API for those who can't use native code 
(e.g. vectorz-clj)
- someone else produces a similar library with a new API that wins on some 
benchmarks (e.g. Neanderthal)
- someone else needs arrays that support non-numerical scalar types (e.g. 
core.matrix NDArray)
- a library becomes unmaintained and someone forks a replacement
- someone wants to integrate a Java matrix library for legacy reasons
- someone else has a bad case of NIH syndrome and creates a whole new 
library
- etc.

Before long you have a fragmented ecosystem with many libraries, many 
different APIs and many annoyed / confused users who can't easily get their 
tools to work together. Many of us have seen this happen before in other 
contexts, and we don't want to see the same thing to happen for Clojure.

core.matrix solves the problem of library fragmentation by providing a 
common abstract API, while allowing users choice over which underlying 
implementation suits their particular needs best. To my knowledge Clojure 
is the *only* language ecosystem that has developed such a capability, and 
it has already proved extremely useful for many users. 

So if you see people asking for Neanderthal to join the core.matrix 
ecosystem, hopefully this helps to explain why.
 

>  
>
>>
>> As for any sort of "responsibility" to implement core.matrix, I don't 
>> think anyone is arguing you have such a responsibility, and I hope our 
>> _pleading_ hasn't come across as such. We are simply impressed with your 
>> work, and would like to take advantage of it, but also see a "drawback" you 
>> don't: at present Neanderthal is less interoperable with many existing 
>> tools, and "trying it out" on an existing project would require a rewrite 
>> (as would migrating away from it if we weren't happy).
>>
>> Certainly, a third party library implementing core.matrix with 
>> Neanderthal is a possibility, but I'm a bit worried that a) it would add 
>> extra burden keeping things in sync and feel a little second class; and 
>> more importantly b) it might be easier to maintain more of the performance 
>> benefits if it's directly integrating (I could imagine less indirection 
>> this way, but could be totally wrong). So let me ask you this:
>>
>> Assuming a) someone forks Neanderthal and makes a core.matrix 
>> implementation with close performance parity to the direct Neanderthal API 
>> and/or b) folks working on core.matrix are able to address some of your 
>> issues with the core.matrix architecture, would you consider a merge?
>>
>
> a) I would rather see the core.matrix interoperability as an additional 
> separate project first, and when/if it shows its value, and there is a 
> person willing to maintain that part of the code, consider adding it to 
> Neanderthal. I wouldn't see it as a second rate, and no fork is needed 
> because of Clojure's extend-type/extend-protocol mechanism. 
>

While this could work from a technical perspective, I would encourage you 
to integrate core.matrix support directly into Neanderthal, for at least 
three reasons:
a) It will allow you to save the effort of creating and maintaining a whole 
duplicate API, when you can simply adopt the core.matrix API (for many 
operations)
b) It will reduce maintenance, testing and deployment effort (for you and 
for others)
c) You are much more likely to get outside contributors if the library 
forms a coherent whole and plays nicely with the rest of the ecosystem

This really isn't hard - in the first instance it is just a matter of 
implementing a few core protocols. To get full performance, you would need 
to implement more of the protocols, but that could be added over time.
 

> b) I am not sure about what's exactly "wrong" with core.matrix. Maybe 
> nothing is wrong. The first thing that I am interested in is what do 
> core.matrix team think is wrong with BLAS/LAPACK in the first place to be 
> able to form an opinion in that regard
>

BLAS/LAPACK is a low level implementation. core.matrix is a higher level 
abstraction of array programming. They simply aren't comparable in a 
meaningful way. It's like comparing the HTTP protocol with the Apache web 
server.

You could certainly use BLAS/LAPACK to create a core.matrix implementation 
(which is roughly what Clatrix does, and what Neanderthal could do if it 
became a core.matrix implementation). Performance of this implementation 
should roughly match raw BLAS/LAPACK (all that core.matrix requires is the 
protocol dispatch overhead, which is pretty minimal and only O(1) per 
operation so it quickly becomes irrelevant for operations on large arrays).

In terms of API, core.matrix is *far* more powerful than BLAS/LAPACK. Some 
examples:
- Support for arbitrary N-dimensional arrays (slicing, reshaping, 
multi-dimensional transposes etc.)
- General purpose array programming operations (analogous to NumPy and APL)
- Independence from underlying implementation. You can support pure-JVM 
implementations (like vectorz-clj for example), native implementations, GPU 
implementations.
- Support for arbitrary scalar types (complex numbers? strings? dates? 
quaternions anyone?)
- Transparent support for both dense and sparse matrices with the same API
- Support for both mutable and immutable arrays
- Transparent support for the in-built Clojure data structures (Clojure 
persistent vectors etc.)
- Support for mixing different array types
- Supports for convenience operations such as broadcasting, coercion

If you build an API that supports all of that with a reasonably coherent 
design... then you'll probably end up with something very similar to 
core.matrix







-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [ANN] Neanderthal, a fast, native matrix and linear algebra library for Clojure released + call for help

Reply via email to