Why are y’all having a long discussion on Vote thread? Ralph
> On Apr 20, 2021, at 10:33 PM, Paul King <paul.king.as...@gmail.com> wrote: > > Hi Avijit Basak, > > +1 to thanking you for your offer. Just a couple of comments from > someone who is only a marginal contributor to the commons project. > > I would be keen to see a new commons component incorporating various > machine learning/data science components. The other main contenders > that seem to be reasonably actively developed are Smile[1] and Weka[2] > which are licensed under GPL or LGPL. Such a component would be a > natural fit for the algorithm you propose. If you look at Apache > Spark[3] and Apache Ignite[4], they both offer some "machine learning" > offerings but they tend to only support algorithms which are either > "embarrassingly" parallel or inherently parallel. They tend not to > include sequential by nature algorithms. Even "embarrassingly" > parallel algorithms are often not included since they can typically > already be used already by Spark, Ignite, Beam, Wayang, or home-grown > threads/fibres. > > There has been previous research into PGA with Hadoop, Spark and > Ignite[5][6] but so far, none of that has made it into those > distributions as far as I know. I don't know how customisable the > Ignite GA algorithm[7] is but it might be worth looking into. > > With respect to component naming, you either go very broad with "math" > or something like "datascience", or potentially too narrow with > something like "ml" or "machinelearning". Of the latter two, "ml" is > most common when bundled into some other framework. The other > alternative is to simply come up with another name but the typical > convention within commons is to use a descriptive to purpose name. > Numerous "ml" libraries also bundle things like regression into them, > so there is precedence for such libraries to be algorithms broadly in > the topic space. On the commons math front, I think regression is > currently earmarked for statistics but not sure it has made the jump > as of yet. An "ml" home would be equally suitable in my mind. > > Having said all of that, as others have pointed out, the volunteer > space in commons is somewhat lean at the moment. I would be happy to > help a little from the ASF side of things but machine learning/data > science isn't my principal area of expertise nor a major aspect in my > "day job" activities, it probably takes others with interest to fully > give this the effort it deserves. But sometimes someone has to get the > ball rolling before other interested parties show up. > > Cheers, Paul > > [1] https://haifengl.github.io/ <https://haifengl.github.io/> > [2] https://www.cs.waikato.ac.nz/ml/weka/ > <https://www.cs.waikato.ac.nz/ml/weka/> > [3] https://spark.apache.org/mllib/ <https://spark.apache.org/mllib/> > [4] https://ignite.apache.org/docs/latest/machine-learning/machine-learning > <https://ignite.apache.org/docs/latest/machine-learning/machine-learning> > [5] https://hajirajabeen.github.io/publications/SGA.pdf > <https://hajirajabeen.github.io/publications/SGA.pdf> > [6] https://dzone.com/articles/genetic-algorithms-with-apache-ignite > <https://dzone.com/articles/genetic-algorithms-with-apache-ignite> > [7] > https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/ml/util/genetic/GeneticAlgorithm.html > > <https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/ml/util/genetic/GeneticAlgorithm.html> > > On Sun, Feb 14, 2021 at 6:06 PM Avijit Basak <avijit.ba...@gmail.com > <mailto:avijit.ba...@gmail.com>> wrote: >> >> Hi >> >> I would like to mention a few points here. Genetic Algorithm has a >> vast range of applications in optimization and search problems. Machine >> learning is only one of those. >> If we couple the new GA library with any specific domain like ml it >> would be meaningless for people working in other domains. They have to >> incorporate the entire ml library which may be completely unrelated to >> their project. Coupling it with any technology like spark might also limit >> it's usability. >> If a separate component is not approved for this change then we can >> incorporate the changes as part of *commons.math* library. >> The same library can be reused in ml or neural network libraries as >> a dependency. >> Kindly share further views on this. >> >> Thanks & Regards >> --Avijit Basak >> >> On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski <gillese...@gmail.com >> <mailto:gillese...@gmail.com>> wrote: >> >>> Le mer. 10 févr. 2021 à 13:19, sebb <seb...@gmail.com >>> <mailto:seb...@gmail.com>> a écrit : >>>> >>>> Likewise, commons-ml is too cryptic. >>>> >>>> Also, the Spark project has a machine-learning library: >>>> >>>> https://spark.apache.org/mllib/ <https://spark.apache.org/mllib/> >>> >>> Thanks for the pointer. >>> >>>> >>>> Maybe that would be better home? >>> >>> On the face of it, probably. >>> [For sure, Avijit should comment on the suggestion.] >>> >>> On the other hand, "Commons" is the place where one can pick "bare >>> bone" implementations, and add the functionality to one's application >>> without necessarily comply with an overarching framework. >>> [I don't mean that framework compliance is bad; quite the contrary, it is >>> hopefully the result of a thorough reflection by experts. But ... cf. the >>> numerous "no-dependency" discussions ...] >>> >>> Actually, concerning Avijit's proposed contribution, didn't I say:[1] >>> ---CUT--- >>> Thus, I think that we must assess whether the "genetic algorithms" >>> functionality has a reasonable future within "Apache Commons" (i.e. >>> potential users and contributors) while there exist other libraries that >>> seem much more advanced for any serious usage. >>> ---CUT--- >>> >>>> I'm also a bit concerned as to whether there are sufficient developers >>>> here with knowledge of the ML domain to be able to support the code in >>>> the future. >>> >>> An interesting point; by all means not a new one (see e.g. [2]). >>> >>> Isn't it the same point I've been making about "Commons Math" (CM)? >>> There has been no releases because nobody here is able (or is willing >>> to) support it. >>> >>> Concerning the support of the purported "machinelearning" component: >>> 1. Package >>> org.apache.commons.math4.ml.neuralnet >>> * I've written it entirely and I have applications that depend on it >>> (and I >>> cannot assume that I could easily switch to, or port it to, Spark), >>> so I >>> can reasonably ensure that it would be supported. >>> 2. Package >>> org.apache.commons.math4.ml.clustering >>> * Functionality is mentioned in Spark's "mllib" user guide. >>> * When a new feature was last contributed[3], it was noticed[4][5][6] >>> that improvement were needed (but there was no follow-up). >>> * I've an application that depend on it (from CM v3.6.1) but I wouldn't >>> support it if shipped in CM v4.0. >>> 3. Package >>> org.apache.commons.math4.genetics >>> * Part of my "end-of-study" project consisted in a GA implementation. >>> I've never used the CM implementation, and I don't deny that there >>> could be perfectly fine uses of it but, just looking at the code, it >>> seems >>> obvious that it cannot compete feature-wise with other libraries >>> out there. >>> * I've suggested long ago that, without anyone supporting it actively >>> (and >>> no known user community), it should be dropped from CM. >>> * Avijit expressed a willingness to improve the functionality: Is >>> this enough >>> for the PMC to create a new component? From the experience with the >>> "clustering" package mentioned above, I'd tend to think >>> (unfortunately) >>> that it isn't. He should first explore whether the Spark community >>> is >>> interested, that the GA functionality be moved over there. >>> >>> Gilles >>> >>> [1] https://issues.apache.org/jira/browse/MATH-1563 >>> <https://issues.apache.org/jira/browse/MATH-1563> >>> [2] https://markmail.org/message/26yxj5vhysdsoety >>> <https://markmail.org/message/26yxj5vhysdsoety> >>> [3] https://issues.apache.org/jira/projects/MATH/issues/MATH-1509 >>> <https://issues.apache.org/jira/projects/MATH/issues/MATH-1509> >>> [4] https://issues.apache.org/jira/projects/MATH/issues/MATH-1524 >>> <https://issues.apache.org/jira/projects/MATH/issues/MATH-1524> >>> [5] https://issues.apache.org/jira/projects/MATH/issues/MATH-1528 >>> <https://issues.apache.org/jira/projects/MATH/issues/MATH-1528> >>> [6] https://issues.apache.org/jira/projects/MATH/issues/MATH-1526 >>> <https://issues.apache.org/jira/projects/MATH/issues/MATH-1526> >>> >>>> >>>> On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg <ebo...@apache.org >>>> <mailto:ebo...@apache.org>> wrote: >>>>> >>>>> -1 for commons-ml for the same reasons. >>>>> >>>>> What about commons-machine-learning or commons-math-learning? The >>> latter >>>>> is as long as commons-configuration. >>>>> >>>>> Emmanuel Bourg >>>>> >>>>> >>>>> Le 2021-02-10 03:27, Ralph Goers a écrit : >>>>>> -1 on commons-ml as the name. My first thought is such a repo would >>>>>> hold stuff related to mailing lists. Then again maybe it contains >>>>>> stuff relating to markup languages. Maybe it is Apache’s version of >>>>>> the ML Programming Language [1]. >>>>>> >>>>>> However, I wouldn’t be -1 on commons-math-ml, although at best I >>> would >>>>>> be +0 since it is still not obvious what it would contain. >>>>>> >>>>>> Ralph >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>> For additional commands, e-mail: dev-h...@commons.apache.org >>> >>> >> >> -- >> Avijit Basak > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > <mailto:dev-unsubscr...@commons.apache.org> > For additional commands, e-mail: dev-h...@commons.apache.org > <mailto:dev-h...@commons.apache.org>