On Wed, Apr 21, 2021 at 4:12 PM Ralph Goers <ralph.go...@dslextreme.com> wrote: > > Why are y’all having a long discussion on Vote thread?
Fair enough. I am +1 (non-binding). Cheers, Paul. > > On Apr 20, 2021, at 10:33 PM, Paul King <paul.king.as...@gmail.com> wrote: > > > > Hi Avijit Basak, > > > > +1 to thanking you for your offer. Just a couple of comments from > > someone who is only a marginal contributor to the commons project. > > > > I would be keen to see a new commons component incorporating various > > machine learning/data science components. The other main contenders > > that seem to be reasonably actively developed are Smile[1] and Weka[2] > > which are licensed under GPL or LGPL. Such a component would be a > > natural fit for the algorithm you propose. If you look at Apache > > Spark[3] and Apache Ignite[4], they both offer some "machine learning" > > offerings but they tend to only support algorithms which are either > > "embarrassingly" parallel or inherently parallel. They tend not to > > include sequential by nature algorithms. Even "embarrassingly" > > parallel algorithms are often not included since they can typically > > already be used already by Spark, Ignite, Beam, Wayang, or home-grown > > threads/fibres. > > > > There has been previous research into PGA with Hadoop, Spark and > > Ignite[5][6] but so far, none of that has made it into those > > distributions as far as I know. I don't know how customisable the > > Ignite GA algorithm[7] is but it might be worth looking into. > > > > With respect to component naming, you either go very broad with "math" > > or something like "datascience", or potentially too narrow with > > something like "ml" or "machinelearning". Of the latter two, "ml" is > > most common when bundled into some other framework. The other > > alternative is to simply come up with another name but the typical > > convention within commons is to use a descriptive to purpose name. > > Numerous "ml" libraries also bundle things like regression into them, > > so there is precedence for such libraries to be algorithms broadly in > > the topic space. On the commons math front, I think regression is > > currently earmarked for statistics but not sure it has made the jump > > as of yet. An "ml" home would be equally suitable in my mind. > > > > Having said all of that, as others have pointed out, the volunteer > > space in commons is somewhat lean at the moment. I would be happy to > > help a little from the ASF side of things but machine learning/data > > science isn't my principal area of expertise nor a major aspect in my > > "day job" activities, it probably takes others with interest to fully > > give this the effort it deserves. But sometimes someone has to get the > > ball rolling before other interested parties show up. > > > > Cheers, Paul > > > > [1] https://haifengl.github.io/ <https://haifengl.github.io/> > > [2] https://www.cs.waikato.ac.nz/ml/weka/ > > <https://www.cs.waikato.ac.nz/ml/weka/> > > [3] https://spark.apache.org/mllib/ <https://spark.apache.org/mllib/> > > [4] https://ignite.apache.org/docs/latest/machine-learning/machine-learning > > <https://ignite.apache.org/docs/latest/machine-learning/machine-learning> > > [5] https://hajirajabeen.github.io/publications/SGA.pdf > > <https://hajirajabeen.github.io/publications/SGA.pdf> > > [6] https://dzone.com/articles/genetic-algorithms-with-apache-ignite > > <https://dzone.com/articles/genetic-algorithms-with-apache-ignite> > > [7] > > https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/ml/util/genetic/GeneticAlgorithm.html > > > > <https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/ml/util/genetic/GeneticAlgorithm.html> > > > > On Sun, Feb 14, 2021 at 6:06 PM Avijit Basak <avijit.ba...@gmail.com > > <mailto:avijit.ba...@gmail.com>> wrote: > >> > >> Hi > >> > >> I would like to mention a few points here. Genetic Algorithm has a > >> vast range of applications in optimization and search problems. Machine > >> learning is only one of those. > >> If we couple the new GA library with any specific domain like ml it > >> would be meaningless for people working in other domains. They have to > >> incorporate the entire ml library which may be completely unrelated to > >> their project. Coupling it with any technology like spark might also limit > >> it's usability. > >> If a separate component is not approved for this change then we can > >> incorporate the changes as part of *commons.math* library. > >> The same library can be reused in ml or neural network libraries as > >> a dependency. > >> Kindly share further views on this. > >> > >> Thanks & Regards > >> --Avijit Basak > >> > >> On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski <gillese...@gmail.com > >> <mailto:gillese...@gmail.com>> wrote: > >> > >>> Le mer. 10 févr. 2021 à 13:19, sebb <seb...@gmail.com > >>> <mailto:seb...@gmail.com>> a écrit : > >>>> > >>>> Likewise, commons-ml is too cryptic. > >>>> > >>>> Also, the Spark project has a machine-learning library: > >>>> > >>>> https://spark.apache.org/mllib/ <https://spark.apache.org/mllib/> > >>> > >>> Thanks for the pointer. > >>> > >>>> > >>>> Maybe that would be better home? > >>> > >>> On the face of it, probably. > >>> [For sure, Avijit should comment on the suggestion.] > >>> > >>> On the other hand, "Commons" is the place where one can pick "bare > >>> bone" implementations, and add the functionality to one's application > >>> without necessarily comply with an overarching framework. > >>> [I don't mean that framework compliance is bad; quite the contrary, it is > >>> hopefully the result of a thorough reflection by experts. But ... cf. the > >>> numerous "no-dependency" discussions ...] > >>> > >>> Actually, concerning Avijit's proposed contribution, didn't I say:[1] > >>> ---CUT--- > >>> Thus, I think that we must assess whether the "genetic algorithms" > >>> functionality has a reasonable future within "Apache Commons" (i.e. > >>> potential users and contributors) while there exist other libraries that > >>> seem much more advanced for any serious usage. > >>> ---CUT--- > >>> > >>>> I'm also a bit concerned as to whether there are sufficient developers > >>>> here with knowledge of the ML domain to be able to support the code in > >>>> the future. > >>> > >>> An interesting point; by all means not a new one (see e.g. [2]). > >>> > >>> Isn't it the same point I've been making about "Commons Math" (CM)? > >>> There has been no releases because nobody here is able (or is willing > >>> to) support it. > >>> > >>> Concerning the support of the purported "machinelearning" component: > >>> 1. Package > >>> org.apache.commons.math4.ml.neuralnet > >>> * I've written it entirely and I have applications that depend on it > >>> (and I > >>> cannot assume that I could easily switch to, or port it to, Spark), > >>> so I > >>> can reasonably ensure that it would be supported. > >>> 2. Package > >>> org.apache.commons.math4.ml.clustering > >>> * Functionality is mentioned in Spark's "mllib" user guide. > >>> * When a new feature was last contributed[3], it was noticed[4][5][6] > >>> that improvement were needed (but there was no follow-up). > >>> * I've an application that depend on it (from CM v3.6.1) but I wouldn't > >>> support it if shipped in CM v4.0. > >>> 3. Package > >>> org.apache.commons.math4.genetics > >>> * Part of my "end-of-study" project consisted in a GA implementation. > >>> I've never used the CM implementation, and I don't deny that there > >>> could be perfectly fine uses of it but, just looking at the code, it > >>> seems > >>> obvious that it cannot compete feature-wise with other libraries > >>> out there. > >>> * I've suggested long ago that, without anyone supporting it actively > >>> (and > >>> no known user community), it should be dropped from CM. > >>> * Avijit expressed a willingness to improve the functionality: Is > >>> this enough > >>> for the PMC to create a new component? From the experience with the > >>> "clustering" package mentioned above, I'd tend to think > >>> (unfortunately) > >>> that it isn't. He should first explore whether the Spark community > >>> is > >>> interested, that the GA functionality be moved over there. > >>> > >>> Gilles > >>> > >>> [1] https://issues.apache.org/jira/browse/MATH-1563 > >>> <https://issues.apache.org/jira/browse/MATH-1563> > >>> [2] https://markmail.org/message/26yxj5vhysdsoety > >>> <https://markmail.org/message/26yxj5vhysdsoety> > >>> [3] https://issues.apache.org/jira/projects/MATH/issues/MATH-1509 > >>> <https://issues.apache.org/jira/projects/MATH/issues/MATH-1509> > >>> [4] https://issues.apache.org/jira/projects/MATH/issues/MATH-1524 > >>> <https://issues.apache.org/jira/projects/MATH/issues/MATH-1524> > >>> [5] https://issues.apache.org/jira/projects/MATH/issues/MATH-1528 > >>> <https://issues.apache.org/jira/projects/MATH/issues/MATH-1528> > >>> [6] https://issues.apache.org/jira/projects/MATH/issues/MATH-1526 > >>> <https://issues.apache.org/jira/projects/MATH/issues/MATH-1526> > >>> > >>>> > >>>> On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg <ebo...@apache.org > >>>> <mailto:ebo...@apache.org>> wrote: > >>>>> > >>>>> -1 for commons-ml for the same reasons. > >>>>> > >>>>> What about commons-machine-learning or commons-math-learning? The > >>> latter > >>>>> is as long as commons-configuration. > >>>>> > >>>>> Emmanuel Bourg > >>>>> > >>>>> > >>>>> Le 2021-02-10 03:27, Ralph Goers a écrit : > >>>>>> -1 on commons-ml as the name. My first thought is such a repo would > >>>>>> hold stuff related to mailing lists. Then again maybe it contains > >>>>>> stuff relating to markup languages. Maybe it is Apache’s version of > >>>>>> the ML Programming Language [1]. > >>>>>> > >>>>>> However, I wouldn’t be -1 on commons-math-ml, although at best I > >>> would > >>>>>> be +0 since it is still not obvious what it would contain. > >>>>>> > >>>>>> Ralph > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >>> For additional commands, e-mail: dev-h...@commons.apache.org > >>> > >>> > >> > >> -- > >> Avijit Basak > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > <mailto:dev-unsubscr...@commons.apache.org> > > For additional commands, e-mail: dev-h...@commons.apache.org > > <mailto:dev-h...@commons.apache.org> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org