Thanks clement for this great post.
I'm really happy that we could put money aside (thanks synectique) to
pay you a PhD.
Stef
Le 17/2/15 11:06, Clément Bera a écrit :
Hello Andrea,
The way you wrote you algorithm is nice but makes extensive use of
closures and iterates a lot over collections. Those are two aspects
where the performance of Pharo have issues. Eliot Miranda and myself
are working especially on those 2 cases to improve Pharo performance.
If you don't mind, I will add your algorithm to the benchmarks we use
because it really makes extensive uses of cases we are trying to
optimize so its results on the bleeding edge VM are very encouraging.
About your implementation, someone familiar with Pharo may change
#timesRepeat: by #to:do: in the 2 places you use it.
For example:
run: points times: times
*1 to: times do:* [ :i | self run: points ].
I don't believe it makes it really harder to read but depending on the
number of times you're using, it may show some real improvements
because #to:do: is optimized at compile-time, though I tried and I got
a -15% on the overall time to run only in the bleeding edge VM.
Another thing is that #groupedBy: is almost never used in the system
and it's really *not* optimized at all. Maybe another collection
protocol is better and not less readable, I don't know.
Now about solutions:
Firstly, the VM is getting faster.
The Pharo 4 VM, to be released in July 2015, should be at
least 2x faster than now. I tried it on your benchmark, and I got
5352.7 instead of 22629.1 on my machine, which is over x4 performance
boost, and which put Pharo between factor and clojure performance. An
alpha release is available here:
https://ci.inria.fr/pharo/view/4.0-VM-Spur/ . You need to use
PharoVM-spur32 as a VM and Pharo-spur32 as an image (Yes, the image
changed too). You should be able to load your code, try your benchmark
and have a similar result.
In addition, we're working on making the VM again much faster
on benchmarks like yours in Pharo 5. We hope to have an alpha release
this summer but we don't know if it will be ready for sure. For this
second step, I'm at a point where I can barely run a bench without a
crash, so I can't tell right now the exact performance you can expect,
but except if there's a miracle it should be somewhere between pypy
and scala performance (It'll reach full performance once it gets more
mature and not at first release anyway). Now I don't think we'll reach
any time soon the performance of languages such as nim or rust.
They're very different from Pharo: direct compilation to machine code,
many low level types, ... I'm not even sure a Java implementation
could compete with them.
Secondly, you can use bindings to native code instead. I showed here
how to write the code in C and bind it with a simple callout, which
may be what you need for your bench:
https://clementbera.wordpress.com/2013/06/19/optimizing-pharo-to-c-speed-with-nativeboost-ffi/
. Now this way of calling C does not work on the latest VM. There are
3 existing frameworks to call C from Pharo, all having pros and cons,
we're trying to unify them but it's taking time. I believe for the
July release of Pharo 4 there will be an official recommended way of
calling C and that's the one you should use.
I hope I wrote you a satisfying answer :-). I'm glad some people are
deeply interested in Pharo performance.
Best,
Clement
2015-02-17 9:03 GMT+01:00 Andrea Ferretti <ferrettiand...@gmail.com
<mailto:ferrettiand...@gmail.com>>:
Hi, a while ago I was evaluating Pharo as a platform for interactive
data exploration, mining and visualization.
I was fairly impressed by the tools offered by the Pharo distribution,
but I had a general feeling that the platform was a little slow, so I
decided to set up a small benchmark, given by an implementation of
K-means.
The original intention was to compare Pharo to Python (a language that
is often used in this niche) and Scala (the language that we use in
production), but since then I have implemented a few other languages
as well. You can find the benchmark here
https://github.com/andreaferretti/kmeans
Unfortunately, it turns out that Pharo is indeed the slowest among the
implementations that I have tried. Since I am not an expert on Pharo
or Smalltalk in general, I am asking advice here to find out if maybe
I am doing something stupid.
To be clear: the aim is *not* to have an optimized version of Kmeans.
There are various ways to improve the algorithm that I am using, but I
am trying to get a feeling for the performance of an algorithm that a
casual user could implement without much thought while exploring some
data. So I am not looking for:
- better algorithms
- clever optimizations, such as, say, invoking native code
I am asking here because there is the real possibility that I am just
messing something up, and the same naive algorithm, written by someone
more competent, would show real improvements.
Please, let me know if you find anything
Best,
Andrea