I assume this thread is about the possible ML component. If the code was developed by Commons, I assume it could be used as part of Spark. However Commons does not currently have many developers who are familiar with the field. So it would seem to me better to have development done by a project which does have relevant experience.
You say that Spark etc have lots of jars. Surely that allows for it to be implemented as a separate jar which can either be used as part of the Spark platform, or used independently? The only other option I see is for Commons to persuade some developers who are familiar with the field to join Commons to assist with the algorithms. Existing Commons developers can help manage the logistics of packaging and releasing the code, as this does not require in depth knowledge of the design. However this only makes sense if the developers skilled in the are are prepared to assist long-term. On Sat, 24 Apr 2021 at 23:32, Paul King <paul.king.as...@gmail.com> wrote: > > Thanks Gilles, > > I can provide the same sort of stats across a clustering example > across commons-math (KMeans) vs Apache Ignite, Apache Spark and > Rheem/Apache Wayang (incubating) if anyone would find that useful. It > would no doubt lead to similar conclusions. > > Cheers, Paul. > > On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <gillese...@gmail.com> wrote: > > > > Hello Paul. > > > > Le sam. 24 avr. 2021 à 04:42, Paul King <paul.king.as...@gmail.com> a écrit > > : > > > > > > I added some more comments relevant to if the proposed algorithm > > > belongs somewhere in the commons "math" area back in the Jira: > > > > > > https://issues.apache.org/jira/browse/MATH-1563 > > > > Thanks for a "real" user's testimony. > > > > As the ML is still the official forum for such a discussion, I'm quoting > > part of your post on JIRA: > > ---CUT--- > > For linear regression, taking just one example dataset, commons-math > > is a couple of library calls for a single 2M library and solves the > > problem in 240ms. Both Ignite and Spark involve "firing up the > > platform" and the code is more complex for simple scenarios. Spark has > > a 181M footprint across 210 jars and solves the problem in about 20s. > > Ignite has a 87M footprint across 85 jars and solves the problem in > > > 40s. But I can also find more complex scenarios which need to scale > > where Ignite and Spark really come into their own. > > ---CUT--- > > > > A similar rationale was behind my developing/using the SOFM > > functionality in the "o.a.c.m.ml.neuralnet" package: I needed a > > proof of concept, and taking the "lightweight" path seemed more > > effective than experimenting with those platforms. > > Admittingly, at that epoch, there were people around, who were > > maintaining the clustering and GA codes; hence, the prototyping > > of a machine-learning library didn't look strange to anyone. > > > > Regards, > > Gilles > > > > >>> [...] > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org