Re: The case for a Commons component

sebb Sun, 25 Apr 2021 07:28:01 -0700

I assume this thread is about the possible ML component.

If the code was developed by Commons, I assume it could be used as
part of Spark.
However Commons does not currently have many developers who are
familiar with the field.
So it would seem to me better to have development done by a project
which does have relevant experience.


You say that Spark etc have lots of jars.
Surely that allows for it to be implemented as a separate jar which
can either be used as part of the Spark platform, or used
independently?

The only other option I see is for Commons to persuade some developers
who are familiar with the field to join Commons to assist with the
algorithms.
Existing Commons developers can help manage the logistics of packaging
and releasing the code, as this does not require in depth knowledge of
the design.
However this only makes sense if the developers skilled in the are are
prepared to assist long-term.


On Sat, 24 Apr 2021 at 23:32, Paul King <[email protected]> wrote:
>
> Thanks Gilles,
>
> I can provide the same sort of stats across a clustering example
> across commons-math (KMeans) vs Apache Ignite, Apache Spark and
> Rheem/Apache Wayang (incubating) if anyone would find that useful. It
> would no doubt lead to similar conclusions.
>
> Cheers, Paul.
>
> On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <[email protected]> wrote:
> >
> > Hello Paul.
> >
> > Le sam. 24 avr. 2021 à 04:42, Paul King <[email protected]> a écrit 
> > :
> > >
> > > I added some more comments relevant to if the proposed algorithm
> > > belongs somewhere in the commons "math" area back in the Jira:
> > >
> > > https://issues.apache.org/jira/browse/MATH-1563
> >
> > Thanks for a "real" user's testimony.
> >
> > As the ML is still the official forum for such a discussion, I'm quoting
> > part of your post on JIRA:
> > ---CUT---
> > For linear regression, taking just one example dataset, commons-math
> > is a couple of library calls for a single 2M library and solves the
> > problem in 240ms. Both Ignite and Spark involve "firing up the
> > platform" and the code is more complex for simple scenarios. Spark has
> > a 181M footprint across 210 jars and solves the problem in about 20s.
> > Ignite has a 87M footprint across 85 jars and solves the problem in >
> > 40s. But I can also find more complex scenarios which need to scale
> > where Ignite and Spark really come into their own.
> > ---CUT---
> >
> > A similar rationale was behind my developing/using the SOFM
> > functionality in the "o.a.c.m.ml.neuralnet" package: I needed a
> > proof of concept, and taking the "lightweight" path seemed more
> > effective than experimenting with those platforms.
> > Admittingly, at that epoch, there were people around, who were
> > maintaining the clustering and GA codes; hence, the prototyping
> > of a machine-learning library didn't look strange to anyone.
> >
> > Regards,
> > Gilles
> >
> > >>> [...]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: The case for a Commons component

Reply via email to