On Monday, June 22, 2015 at 2:02:19 AM UTC+2, Mikera wrote: > > > There is nothing fundamentally wrong with BLAS/LAPACK, it just isn't > suitable as a general purpose array programming API. See my comments > further below. >
I was discussing it from the *matrix API* perspective. My comments follow: > If you think the core.matrix API is "unintuitive and complicated" then I'd > love to hear specific examples. We're still open to changing things before > we hit 1.0 > I will only give a couple basic ones, but I think they draw a bigger picture. Let's say I am a Clojure programmer with no huge experience in numerical computing. I do have some knowledge about linear algebra and have a textbook or a paper with an algorithm that I need, which is based on some linear algebra operations. I'd say that this is the most common use case for an API such as core.matrix, and I hope you agree. After trying to write my own loops and recursion and fail to do it well, I shop around and find core.matrix with its cool proposal: a lot of numerical stuff in Clojure, with pluggable implementations. Yahooo! My problem is almost solved. Go to the main work right away: 1. I add it to my project and try the + example from the github page. It works. 2. Now I start implementing my algorithm. How to add-and-multiply a few matrices? THERE IS NO API DOC. I have to google and find https://github.com/mikera/core.matrix/wiki/Vectors-vs.-matrices so I guess it's mmul, but there is a lot of talk of some loosely related implementation details. Column matrixes, slices, ndarrays... What? A lot of implementation dependent info, almost no info on what I need (API). 3. I read the mailing list and the source code, and, if I manage to filter API information from a lot of implementation discussion I manage to draw a rough sketch of what I need (API). 4. I implement my algorithm with the default implementation (vectorz) and it works. I measure the performance, and as soon as the data size becomes a little more serious, it's too slow. No problem - pluggable implementations are here. Surely that Clatrix thing must be blazingly fast, it's native. I switch the implementations in no time, and get even poorer performance. WHAT? 5. I try to find help on the mailing list. I was using the implementation in a wrong way. WHY? It was all right with vectorz! Well, we didn't quite implemented it fully. A lot of functions are fallback. The implementation is not suitable for that particular call... Seriously? It's featured on the front page! 6. But, what is the right way to use it? I want to learn. THERE IS NO INFO. But, look at this, you can treat a Clojure vector as a quaternion and multiply it with a JSON hash-map, which is treated as a matrix of characters (OK, I am exaggerating, but not that much :) etc, etc... But it certainly isn't "arbitrarily invented". Please note that we have > collectively considered a *lot* of previous work in the development of > core.matrix. People involved in the design have had experience with BLAS, > Fortran, NumPy, R, APL, numerous Java libraries, GPU acceleration, low > level assembly coding etc. We'd welcome your contributions too.... but I > hope you will first take the time to read the mailing list history etc. and > gain an appreciation for the design decisions. > I read lots of those discussions before. I may or may not agree with the written fully or partially, but I see that the result is far from what I find recommended in numerical computing literature that I read, and I do not see the core.matrix implementations show that literature wrong. > >> >> In my opinion, the best way to create a standard API is to grow it from >> successful implementations, instead of writing it first, and then >> shoehorning the implementations to fit it. >> > > It is (comparatively) easy to write an API for a specific implementation > that supports a few specific operations and/or meets a specific use case. > The original Clatrix is an example of one such library. > Can you point me to some of the implementations where switching the implementation of an algorithm from vectorz to clatrix shows performance boost? And, easy? Surely then the Clatrix implementation would be fully implemented and properly supported (and documented) after 2-3 years since it was included? > But that soon falls apart when you realise that the API+implementation > doesn't meet broader requirements, so you quickly get fragmentation e.g. > - someone else creates a pure-JVM API for those who can't use native code > (e.g. vectorz-clj) > So, what is wrong with that? There are dozens of Clojure libraries for SQL, http, visualization, etc, and all have their place. > - someone else produces a similar library with a new API that wins on some > benchmarks (e.g. Neanderthal) > I get your point, but would just note that Neanderthal wins *ALL* benchmark (that fit use cases that I need). Not because it is something too clever, but because it stands on the shoulders of giants (ATLAS). > - someone else needs arrays that support non-numerical scalar types (e.g. > core.matrix NDArray) > - a library becomes unmaintained and someone forks a replacement > - someone wants to integrate a Java matrix library for legacy reasons > - someone else has a bad case of NIH syndrome and creates a whole new > library > That could be said about virtually every application domain. Why is here many http, html, javascript, database APIs? Why don't have one API that could be used for any existing library? It's not that people didn't try. I prefer the microframework approach to a monolithic framework that has one true way. > > Before long you have a fragmented ecosystem with many libraries, many > different APIs and many annoyed / confused users who can't easily get their > tools to work together. Many of us have seen this happen before in other > contexts, and we don't want to see the same thing to happen for Clojure. > How many implementations of core.matrix work *WELL* together, for all supported use cases today? > core.matrix solves the problem of library fragmentation by providing a > common abstract API, while allowing users choice over which underlying > implementation suits their particular needs best. To my knowledge Clojure > is the *only* language ecosystem that has developed such a capability, and > it has already proved extremely useful for many users. > How many choices there are today that fully and properly implement core.matrix? So if you see people asking for Neanderthal to join the core.matrix > ecosystem, hopefully this helps to explain why. > As I explained that I would *LOVE* to be able to do such integration and benefit from it myself. But, I failed to see how to do it properly, and satisfy core.matrix goals (and my goals with Neanderthal at the same time). Currently, I see core.matrix as a formula: idea + ? = success I do not say that I do not like the idea generally. I would *LOVE* to see such thing. I do not see what is the "?" yet, and the current offering do not convince me that other people (core.matrix) can see it. > >> a) I would rather see the core.matrix interoperability as an additional >> separate project first, and when/if it shows its value, and there is a >> person willing to maintain that part of the code, consider adding it to >> Neanderthal. I wouldn't see it as a second rate, and no fork is needed >> because of Clojure's extend-type/extend-protocol mechanism. >> > > > While this could work from a technical perspective, I would encourage you > to integrate core.matrix support directly into Neanderthal, for at least > three reasons: > a) It will allow you to save the effort of creating and maintaining a > whole duplicate API, when you can simply adopt the core.matrix API (for > many operations) > If I could do it simply, I would have already do that. I do not have to maintain a duplicate API now, though. I can maintain a simple Neanderthal API that I understand, which is based on BLAS/LAPACK, with lots of literature and know-how available online and offline, which does one thing and (in my opinion) does it well, and leave core.matrix integration for anyone that needs it. > b) It will reduce maintenance, testing and deployment effort (for you and > for others) > If core.matrix was a good fit. However, I failed to see it that way by now. > c) You are much more likely to get outside contributors if the library > forms a coherent whole and plays nicely with the rest of the ecosystem > That is true. However, I would rather have a library that fits well to my needs even if it attracts less people. And, I do not see how it is difficult to integrate Neanderthal with other libraries, since I used it with plotting libraries (clojure/java and external), and the integration was straightforward. > This really isn't hard - in the first instance it is just a matter of > implementing a few core protocols. To get full performance, you would need > to implement more of the protocols, but that could be added over time. > > >> b) I am not sure about what's exactly "wrong" with core.matrix. Maybe >> nothing is wrong. The first thing that I am interested in is what do >> core.matrix team think is wrong with BLAS/LAPACK in the first place to be >> able to form an opinion in that regard >> > > BLAS/LAPACK is a low level implementation. core.matrix is a higher level > abstraction of array programming. They simply aren't comparable in a > meaningful way. It's like comparing the HTTP protocol with the Apache web > server. > Here I have to disagree. BLAS/LAPACK is not a low level implementation. It is a *DE FACTO STANDARD* for numerical linear algebra for dense matrices: http://www.netlib.org/blas/blast-forum/blas-report.pdf. There are many implementations of that standard, and they set a really high mark. Besides atlas, there are Intel MKL, OpenBLAS, cuBLAS, clBLAS, and many other highly performant libraries. All implementing an API that's been crafted for decades and is as battle tested as a library could be. > You could certainly use BLAS/LAPACK to create a core.matrix implementation > (which is roughly what Clatrix does, and what Neanderthal could do if it > became a core.matrix implementation). Performance of this implementation > should roughly match raw BLAS/LAPACK (all that core.matrix requires is the > protocol dispatch overhead, which is pretty minimal and only O(1) per > operation so it quickly becomes irrelevant for operations on large arrays). > Looking at the state of Clatrix integration, I have to disagree with that. Certainly anybody *COULD* program anything (hypoteticall) , but I'd stay more down to earth: what *IS*, and what are the reasons that it *IS NOT (yet)*. > > In terms of API, core.matrix is *far* more powerful than BLAS/LAPACK. > I do not agree. For *numerical linear algebra* it is not even close to BLAS/LAPACK. I agree BLAS/LAPACK is not a good date parser, and I am glad that it is not :) > Some examples: > - Support for arbitrary N-dimensional arrays (slicing, reshaping, > multi-dimensional transposes etc.) > And core.matrix is? Compared to NymPy? Compared to state of the art tensor libraries? (Torch, etc.) > - General purpose array programming operations (analogous to NumPy and APL) > See previous. > - Independence from underlying implementation. You can support pure-JVM > implementations (like vectorz-clj for example), native implementations, GPU > implementations. > That could be said for any API. Neanderthal have interfaces, which I think are simpler than core.matrix, and it also *can* support pure-JVM, native, GPU. The question is: which of those things *ARE* supported? Or maybe the answer to question "why they are not supported" would answer why I do not feel core.matrix is such wonderful solution for a generally noble goal. > - Support for arbitrary scalar types (complex numbers? strings? dates? > quaternions anyone?) > Where is that support for complex numbers? (To remind you, BLAS/LAPACK already has it, although I didn't implement that part in Neanderthal yet, since I didn't need it). > - Transparent support for both dense and sparse matrices with the same API > What is the performance of such operations? There is a reason dense/sparse APIs are different in numerical libraries. > - Support for both mutable and immutable arrays > While I support such thing for the sake of effort in the name of elegance, in *numerical computing* mutable structures are what is important, and that is the first thing that any literature stresses first. > - Transparent support for the in-built Clojure data structures (Clojure > persistent vectors etc.) > Transparent is what worries me. Looks appealing, but shoots me in the foot. Again, in *numerical computing*, when I want to convert something, I want it to be explicit. > - Support for mixing different array types > - Supports for convenience operations such as broadcasting, coercion > See the previous point. > If you build an API that supports all of that with a reasonably coherent > design... then you'll probably end up with something very similar to > core.matrix > On the other thing, I may think such an API an unnecessary complex and overblown thing. So, I guess that we just have a different perspective here. Thank you for the great effort, though. Even if I do not see that it fits my needs, I am glad that it is a great tool for other people. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.