Hi Avijit Basak,

+1 to thanking you for your offer. Just a couple of comments from
someone who is only a marginal contributor to the commons project.

I would be keen to see a new commons component incorporating various
machine learning/data science components. The other main contenders
that seem to be reasonably actively developed are Smile[1] and Weka[2]
which are licensed under GPL or LGPL. Such a component would be a
natural fit for the algorithm you propose. If you look at Apache
Spark[3] and Apache Ignite[4], they both offer some "machine learning"
offerings but they tend to only support algorithms which are either
"embarrassingly" parallel or inherently parallel. They tend not to
include sequential by nature algorithms. Even "embarrassingly"
parallel algorithms are often not included since they can typically
already be used already by Spark, Ignite, Beam, Wayang, or home-grown
threads/fibres.

There has been previous research into PGA with Hadoop, Spark and
Ignite[5][6] but so far, none of that has made it into those
distributions as far as I know. I don't know how customisable the
Ignite GA algorithm[7] is but it might be worth looking into.

With respect to component naming, you either go very broad with "math"
or something like "datascience", or potentially too narrow with
something like "ml" or "machinelearning". Of the latter two, "ml" is
most common when bundled into some other framework. The other
alternative is to simply come up with another name but the typical
convention within commons is to use a descriptive to purpose name.
Numerous "ml" libraries also bundle things like regression into them,
so there is precedence for such libraries to be algorithms broadly in
the topic space. On the commons math front, I think regression is
currently earmarked for statistics but not sure it has made the jump
as of yet. An "ml" home would be equally suitable in my mind.

Having said all of that, as others have pointed out, the volunteer
space in commons is somewhat lean at the moment. I would be happy to
help a little from the ASF side of things but machine learning/data
science isn't my principal area of expertise nor a major aspect in my
"day job" activities, it probably takes others with interest to fully
give this the effort it deserves. But sometimes someone has to get the
ball rolling before other interested parties show up.

Cheers, Paul

[1] https://haifengl.github.io/
[2] https://www.cs.waikato.ac.nz/ml/weka/
[3] https://spark.apache.org/mllib/
[4] https://ignite.apache.org/docs/latest/machine-learning/machine-learning
[5] https://hajirajabeen.github.io/publications/SGA.pdf
[6] https://dzone.com/articles/genetic-algorithms-with-apache-ignite
[7] 
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/ml/util/genetic/GeneticAlgorithm.html

On Sun, Feb 14, 2021 at 6:06 PM Avijit Basak <avijit.ba...@gmail.com> wrote:
>
> Hi
>
>        I would like to mention a few points here. Genetic Algorithm has a
> vast range of applications in optimization and search problems. Machine
> learning is only one of those.
>        If we couple the new GA library with any specific domain like ml it
> would be meaningless for people working in other domains. They have to
> incorporate the entire ml library which may be completely unrelated to
> their project. Coupling it with any technology like spark might also limit
> it's usability.
>        If a separate component is not approved for this change then we can
> incorporate the changes as part of *commons.math* library.
>        The same library can be reused in ml or neural network libraries as
> a dependency.
>        Kindly share further views on this.
>
> Thanks & Regards
> --Avijit Basak
>
> On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski <gillese...@gmail.com> wrote:
>
> > Le mer. 10 févr. 2021 à 13:19, sebb <seb...@gmail.com> a écrit :
> > >
> > > Likewise, commons-ml is too cryptic.
> > >
> > > Also, the Spark project has a machine-learning library:
> > >
> > > https://spark.apache.org/mllib/
> >
> > Thanks for the pointer.
> >
> > >
> > > Maybe that would be better home?
> >
> > On the face of it, probably.
> > [For sure, Avijit should comment on the suggestion.]
> >
> > On the other hand, "Commons" is the place where one can pick "bare
> > bone" implementations, and add the functionality to one's application
> > without necessarily comply with an overarching framework.
> > [I don't mean that framework compliance is bad; quite the contrary, it is
> > hopefully the result of a thorough reflection by experts.  But ... cf. the
> > numerous "no-dependency" discussions ...]
> >
> > Actually, concerning Avijit's proposed contribution, didn't I say:[1]
> > ---CUT---
> > Thus, I think that we must assess whether the "genetic algorithms"
> > functionality has a reasonable future within "Apache Commons" (i.e.
> > potential users and contributors) while there exist other libraries that
> > seem much more advanced for any serious usage.
> > ---CUT---
> >
> > > I'm also a bit concerned as to whether there are sufficient developers
> > > here with knowledge of the ML domain to be able to support the code in
> > > the future.
> >
> > An interesting point; by all means not a new one (see e.g. [2]).
> >
> > Isn't it the same point I've been making about "Commons Math" (CM)?
> > There has been no releases because nobody here is able (or is willing
> > to) support it.
> >
> > Concerning the support of the purported "machinelearning" component:
> > 1. Package
> >         org.apache.commons.math4.ml.neuralnet
> >     * I've written it entirely and I have applications that depend on it
> > (and I
> >       cannot assume that I could easily switch to, or port it to, Spark),
> > so I
> >       can reasonably ensure that it would be supported.
> > 2. Package
> >         org.apache.commons.math4.ml.clustering
> >     * Functionality is mentioned in Spark's "mllib" user guide.
> >     * When a new feature was last contributed[3], it was noticed[4][5][6]
> >       that improvement were needed (but there was no follow-up).
> >     * I've an application that depend on it (from CM v3.6.1) but I wouldn't
> >       support it if shipped in CM v4.0.
> > 3. Package
> >         org.apache.commons.math4.genetics
> >     * Part of my "end-of-study" project consisted in a GA implementation.
> >       I've never used the CM implementation, and I don't deny that there
> >       could be perfectly fine uses of it but, just looking at the code, it
> > seems
> >       obvious that it cannot compete feature-wise with other libraries
> > out there.
> >     * I've suggested long ago that, without anyone supporting it actively
> > (and
> >       no known user community), it should be dropped from CM.
> >     * Avijit expressed a willingness to improve the functionality:  Is
> > this enough
> >       for the PMC to create a new component?  From the experience with the
> >       "clustering" package mentioned above, I'd tend to think
> > (unfortunately)
> >       that it isn't.  He should first explore whether the Spark community
> > is
> >       interested, that the GA functionality be moved over there.
> >
> > Gilles
> >
> > [1] https://issues.apache.org/jira/browse/MATH-1563
> > [2] https://markmail.org/message/26yxj5vhysdsoety
> > [3] https://issues.apache.org/jira/projects/MATH/issues/MATH-1509
> > [4] https://issues.apache.org/jira/projects/MATH/issues/MATH-1524
> > [5] https://issues.apache.org/jira/projects/MATH/issues/MATH-1528
> > [6] https://issues.apache.org/jira/projects/MATH/issues/MATH-1526
> >
> > >
> > > On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg <ebo...@apache.org> wrote:
> > > >
> > > > -1 for commons-ml for the same reasons.
> > > >
> > > > What about commons-machine-learning or commons-math-learning? The
> > latter
> > > > is as long as commons-configuration.
> > > >
> > > > Emmanuel Bourg
> > > >
> > > >
> > > > Le 2021-02-10 03:27, Ralph Goers a écrit :
> > > > > -1 on commons-ml as the name. My first thought is such a repo would
> > > > > hold stuff related to mailing lists. Then again maybe it contains
> > > > > stuff relating to markup languages. Maybe it is Apache’s version of
> > > > > the ML Programming Language [1].
> > > > >
> > > > > However, I wouldn’t be -1 on commons-math-ml, although at best I
> > would
> > > > > be +0 since it is still not obvious what it would contain.
> > > > >
> > > > > Ralph
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>
> --
> Avijit Basak

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to