Hi, a while ago I was evaluating Pharo as a platform for interactive
data exploration, mining and visualization.

I was fairly impressed by the tools offered by the Pharo distribution,
but I had a general feeling that the platform was a little slow, so I
decided to set up a small benchmark, given by an implementation of
K-means.

The original intention was to compare Pharo to Python (a language that
is often used in this niche) and Scala (the language that we use in
production), but since then I have implemented a few other languages
as well. You can find the benchmark here

https://github.com/andreaferretti/kmeans

Unfortunately, it turns out that Pharo is indeed the slowest among the
implementations that I have tried. Since I am not an expert on Pharo
or Smalltalk in general, I am asking advice here to find out if maybe
I am doing something stupid.

To be clear: the aim is *not* to have an optimized version of Kmeans.
There are various ways to improve the algorithm that I am using, but I
am trying to get a feeling for the performance of an algorithm that a
casual user could implement without much thought while exploring some
data. So I am not looking for:

- better algorithms
- clever optimizations, such as, say, invoking native code

I am asking here because there is the real possibility that I am just
messing something up, and the same naive algorithm, written by someone
more competent, would show real improvements.

Please, let me know if you find anything
Best,
Andrea

Reply via email to