How many committers will be active for this component? Ralph
> On Apr 26, 2021, at 7:17 AM, Avijit Basak <avijit.ba...@gmail.com> wrote: > > Hi > > As per previous discussions, I have created a temporary repository > in GitHub under my personal GitHub Id(avijitbasak). The artifacts have been > copied from commons-numbers. A preliminary structure has been created for > the proposed component. > Please let me know if we want to proceed with this format. We can copy the > same to any other team repository if required. > > Repo URL: https://github.com/avijitbasak/commons-machinelearning > > Thanks & Regards > --Avijit Basak > > On Mon, 26 Apr 2021 at 04:49, Paul King <paul.king.as...@gmail.com> wrote: > >> On Mon, Apr 26, 2021 at 12:27 AM sebb <seb...@gmail.com> wrote: >>> >>> I assume this thread is about the possible ML component. >>> >>> If the code was developed by Commons, I assume it could be used as >>> part of Spark. >>> However Commons does not currently have many developers who are >>> familiar with the field. >>> So it would seem to me better to have development done by a project >>> which does have relevant experience. >>> >>> You say that Spark etc have lots of jars. >>> Surely that allows for it to be implemented as a separate jar which >>> can either be used as part of the Spark platform, or used >>> independently? >> >> The stats I gave were for the current minimal use of those algorithms. >> Most algorithms are written in Scala, use RDD "dataframes" rather than >> say double arrays, and assume you're running on "the platform" which >> handles how you might get your data and return results and do logging >> etc. in a potentially concurrent world. Some of those design choices >> are key to scaling up but don't align with the goal of making the >> algorithms runnable "independently". >> >>> The only other option I see is for Commons to persuade some developers >>> who are familiar with the field to join Commons to assist with the >>> algorithms. >> >> I agree that is the crux of the issue here. The "commons doesn't have >> the bandwidth to absorb another algorithm" part of the discussion >> seems perfectly legit to me. The "and there is an obvious home >> elsewhere" part of the discussion seemed a little more dubious to me, >> though obviously that is something which should be considered. >> >>> Existing Commons developers can help manage the logistics of packaging >>> and releasing the code, as this does not require in depth knowledge of >>> the design. >>> However this only makes sense if the developers skilled in the are are >>> prepared to assist long-term. >>> >>> >>> On Sat, 24 Apr 2021 at 23:32, Paul King <paul.king.as...@gmail.com> >> wrote: >>>> >>>> Thanks Gilles, >>>> >>>> I can provide the same sort of stats across a clustering example >>>> across commons-math (KMeans) vs Apache Ignite, Apache Spark and >>>> Rheem/Apache Wayang (incubating) if anyone would find that useful. It >>>> would no doubt lead to similar conclusions. >>>> >>>> Cheers, Paul. >>>> >>>> On Sun, Apr 25, 2021 at 8:15 AM Gilles Sadowski <gillese...@gmail.com> >> wrote: >>>>> >>>>> Hello Paul. >>>>> >>>>> Le sam. 24 avr. 2021 à 04:42, Paul King <paul.king.as...@gmail.com> >> a écrit : >>>>>> >>>>>> I added some more comments relevant to if the proposed algorithm >>>>>> belongs somewhere in the commons "math" area back in the Jira: >>>>>> >>>>>> https://issues.apache.org/jira/browse/MATH-1563 >>>>> >>>>> Thanks for a "real" user's testimony. >>>>> >>>>> As the ML is still the official forum for such a discussion, I'm >> quoting >>>>> part of your post on JIRA: >>>>> ---CUT--- >>>>> For linear regression, taking just one example dataset, commons-math >>>>> is a couple of library calls for a single 2M library and solves the >>>>> problem in 240ms. Both Ignite and Spark involve "firing up the >>>>> platform" and the code is more complex for simple scenarios. Spark >> has >>>>> a 181M footprint across 210 jars and solves the problem in about 20s. >>>>> Ignite has a 87M footprint across 85 jars and solves the problem in > >>>>> 40s. But I can also find more complex scenarios which need to scale >>>>> where Ignite and Spark really come into their own. >>>>> ---CUT--- >>>>> >>>>> A similar rationale was behind my developing/using the SOFM >>>>> functionality in the "o.a.c.m.ml.neuralnet" package: I needed a >>>>> proof of concept, and taking the "lightweight" path seemed more >>>>> effective than experimenting with those platforms. >>>>> Admittingly, at that epoch, there were people around, who were >>>>> maintaining the clustering and GA codes; hence, the prototyping >>>>> of a machine-learning library didn't look strange to anyone. >>>>> >>>>> Regards, >>>>> Gilles >>>>> >>>>>>>> [...] >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>> For additional commands, e-mail: dev-h...@commons.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> >> > > -- > Avijit Basak --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org