Re: Clojure with Tensorflow, Torch etc (call for participation, brainstorming etc)

Dragan Djuric Thu, 06 Oct 2016 13:46:53 -0700

Hi Kovas,


> By the way, I'd love to see matrix/tensor benchmarks of Neanderthal and 
> Vectorz vs ND4J, MXNet's NDArray, and BidMat..  :)
>

I don't have exact numbers, but will try to give you a few pointers to help 
you if you decide to investigate this further:

0. Neanderthal's scope is matrices and linear algebra. NNs and other stuff 
is something that could be built on top of it (assuming that the features 
needed are implemented, which may or may not be true yet), but certainly 
not in Neanderthal.

1. Neanderthal is a 100% Clojure solution. One of the main goals, other 
than the speed, of course, is that it is simple and straightforward, with 
no overhead. That means that you always know what backend you are using, 
and you get exactly the speed of that backend. If it works, you are sure it 
works at the full speed, with no slow fallback. Theoretically, of course, 
there is always some overhead of FFI, but in Neanderthal it is so miniscule 
that you can ignore it for all uses that come to my mind. So, basically, 
Neanderthal is as fast as ATLAS on CPU and CLBlast on GPU (both offer 
state-of-the-art speed) or any (not yet existing) pure java engine that I 
might plug in in the future if necessary.

2. All those other libraries, besides not targeting Clojure at all except 
for general "you can call Java from Clojure", are trying to be everything 
for everyone. That has its strengths, because you are, generally, able to 
accommodate more use cases. On the other hand, it complicates things too 
much, and can lead to overblown beasts. For example, it might seem good to 
support MKL, and, ATLAS, and OpenBLAS, and netlib BLAS, and some imaginary 
fallback solution, like ND4J does (or tries to do), but what's the point of 
it when today they have more or less the same performance (MKL being a bit 
faster - in percentages only - but requires $$), and supporting all that 
stuff makes code AND the installation much, much, more compelex. BLAS is so 
mature that I think it is better to choose one solution and offer it out of 
the box. Technically, neanderthal can support all other native blas 
libraries too, but I intentionally restricted that option because I think 
fiddling with it does more harm than good. I prefer to give users one Ford 
model T, than let them choose between 20 different horse carriages. And, if 
they can even choose the color, provided that their choice is black :)

3. ND4J is, in my opinion, a typical overblown solution. deeplearning4j 
guys got the investment dollars, and have to rush to the market with 
business-friendly solution, which usually favors having a checklist of 
features regardless of whether those features make sense for the little 
guy. I hope they succeed in the business sense, but the code I'm seeing 
from them does not seem promising to me regarding Java getting a great 
DL/NN library.

4. NDArray is actually, as the name suggests, a nd array library, and not a 
matrix library. Why is this important?  Vectors and matrices are something 
that has been very well researched through decades. The ins and outs of 
algorithm/architecture fit are known and implemented in BLAS libraries, so 
you are sure that you get the full performance. N-dimensional vectors 
(sometime referred as tensors, although that name is not accurate IMO) not 
so much. So, it is easy that an operation that looks convenient does not 
lead to a good performance. I do not say it is bad, because if you need 
that operation, it is better to have something than nothing, but for now I 
decided to not support 3+ dimensions. This is something that might belong 
to Neanderthal or on top of it. A long term goal, to be sure. Another 
aspect of that story is knowledge: most books that I read from the fields 
of ML/AI give all formulas as vectors of matrices. Basically, matrices are 
at least 95% (if not more) of potential users need or even understand!

5. BidiMat seems to have much larger scope. For example, at their benchmark 
page, I see benchmarks for machine learning algorithms, but for nothing 
matrix-y.

The speed comparison; it boils down to this: both Neanderthal and those 
libraries use (or can be linked to use) the same native BLAS libraries. I 
took great care to make sure Neanderthal does not incur any copying or 
calling overhead. From what I saw glancing at the code of other libraries, 
they didn't. They might support that if you set up everything well among 
lots of options, or they don't if you do not know how to ensure this. So I 
doubt any of those could be noticeably faster, and they can be much slower 
if you slip somewhere.
I would also love to see straightforward numbers, but I was unable to find 
anything like that for those libraries. BidiMat, for example, gives 
benchmarks of K-Means on MNIST dataset - I do not know how this can be used 
to discern how fast is it with matrices, other that it is, generally, fast 
at K-Means.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Clojure with Tensorflow, Torch etc (call for participation, brainstorming etc)

Reply via email to