Re: The case for a Commons component

Gilles Sadowski Sun, 25 Apr 2021 05:40:55 -0700

Le dim. 25 avr. 2021 à 00:32, Paul King <paul.king.as...@gmail.com> a écrit :
>
> Thanks Gilles,
>
> I can provide the same sort of stats across a clustering example
> across commons-math (KMeans) vs Apache Ignite, Apache Spark and
> Rheem/Apache Wayang (incubating) if anyone would find that useful. It
> would no doubt lead to similar conclusions.


There also were relatively recent discussions concerning the codes in
the "o.a.c.m.ml.clustering" package.[1]
If they are useful as of the old CM v3.6.1, they can very probably be
improved upon in terms of flexibilty[2] and performance through (a.o.
things) multi-threading (in much the same way as for GA, I guess).

Best regards,
Gilles

[1] https://issues.apache.org/jira/browse/MATH-1515
[2] Fixes and enhancements are already in CM "master" branch.

>
> Cheers, Paul.
>
> On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <gillese...@gmail.com> wrote:
> >
> > Hello Paul.
> >
> > Le sam. 24 avr. 2021 à 04:42, Paul King <paul.king.as...@gmail.com> a écrit 
> > :
> > >
> > > I added some more comments relevant to if the proposed algorithm
> > > belongs somewhere in the commons "math" area back in the Jira:
> > >
> > > https://issues.apache.org/jira/browse/MATH-1563
> >
> > Thanks for a "real" user's testimony.
> >
> > As the ML is still the official forum for such a discussion, I'm quoting
> > part of your post on JIRA:
> > ---CUT---
> > For linear regression, taking just one example dataset, commons-math
> > is a couple of library calls for a single 2M library and solves the
> > problem in 240ms. Both Ignite and Spark involve "firing up the
> > platform" and the code is more complex for simple scenarios. Spark has
> > a 181M footprint across 210 jars and solves the problem in about 20s.
> > Ignite has a 87M footprint across 85 jars and solves the problem in >
> > 40s. But I can also find more complex scenarios which need to scale
> > where Ignite and Spark really come into their own.
> > ---CUT---
> >
> > A similar rationale was behind my developing/using the SOFM
> > functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
> > proof of concept, and taking the "lightweight" path seemed more
> > effective than experimenting with those platforms.
> > Admittingly, at that epoch, there were people around, who were
> > maintaining the clustering and GA codes; hence, the prototyping
> > of a machine-learning library didn't look strange to anyone.
> >
> > Regards,
> > Gilles
> >
> > >>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: The case for a Commons component

Reply via email to