Re: [Vote] Create a "machine learning" component

Ralph Goers Tue, 20 Apr 2021 23:12:51 -0700

Why are y’all having a long discussion on Vote thread?

Ralph


> On Apr 20, 2021, at 10:33 PM, Paul King <paul.king.as...@gmail.com> wrote:
> 
> Hi Avijit Basak,
> 
> +1 to thanking you for your offer. Just a couple of comments from
> someone who is only a marginal contributor to the commons project.
> 
> I would be keen to see a new commons component incorporating various
> machine learning/data science components. The other main contenders
> that seem to be reasonably actively developed are Smile[1] and Weka[2]
> which are licensed under GPL or LGPL. Such a component would be a
> natural fit for the algorithm you propose. If you look at Apache
> Spark[3] and Apache Ignite[4], they both offer some "machine learning"
> offerings but they tend to only support algorithms which are either
> "embarrassingly" parallel or inherently parallel. They tend not to
> include sequential by nature algorithms. Even "embarrassingly"
> parallel algorithms are often not included since they can typically
> already be used already by Spark, Ignite, Beam, Wayang, or home-grown
> threads/fibres.
> 
> There has been previous research into PGA with Hadoop, Spark and
> Ignite[5][6] but so far, none of that has made it into those
> distributions as far as I know. I don't know how customisable the
> Ignite GA algorithm[7] is but it might be worth looking into.
> 
> With respect to component naming, you either go very broad with "math"
> or something like "datascience", or potentially too narrow with
> something like "ml" or "machinelearning". Of the latter two, "ml" is
> most common when bundled into some other framework. The other
> alternative is to simply come up with another name but the typical
> convention within commons is to use a descriptive to purpose name.
> Numerous "ml" libraries also bundle things like regression into them,
> so there is precedence for such libraries to be algorithms broadly in
> the topic space. On the commons math front, I think regression is
> currently earmarked for statistics but not sure it has made the jump
> as of yet. An "ml" home would be equally suitable in my mind.
> 
> Having said all of that, as others have pointed out, the volunteer
> space in commons is somewhat lean at the moment. I would be happy to
> help a little from the ASF side of things but machine learning/data
> science isn't my principal area of expertise nor a major aspect in my
> "day job" activities, it probably takes others with interest to fully
> give this the effort it deserves. But sometimes someone has to get the
> ball rolling before other interested parties show up.
> 
> Cheers, Paul
> 
> [1] https://haifengl.github.io/ <https://haifengl.github.io/>
> [2] https://www.cs.waikato.ac.nz/ml/weka/ 
> <https://www.cs.waikato.ac.nz/ml/weka/>
> [3] https://spark.apache.org/mllib/ <https://spark.apache.org/mllib/>
> [4] https://ignite.apache.org/docs/latest/machine-learning/machine-learning 
> <https://ignite.apache.org/docs/latest/machine-learning/machine-learning>
> [5] https://hajirajabeen.github.io/publications/SGA.pdf 
> <https://hajirajabeen.github.io/publications/SGA.pdf>
> [6] https://dzone.com/articles/genetic-algorithms-with-apache-ignite 
> <https://dzone.com/articles/genetic-algorithms-with-apache-ignite>
> [7] 
> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/ml/util/genetic/GeneticAlgorithm.html
>  
> <https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/ml/util/genetic/GeneticAlgorithm.html>
> 
> On Sun, Feb 14, 2021 at 6:06 PM Avijit Basak <avijit.ba...@gmail.com 
> <mailto:avijit.ba...@gmail.com>> wrote:
>> 
>> Hi
>> 
>>       I would like to mention a few points here. Genetic Algorithm has a
>> vast range of applications in optimization and search problems. Machine
>> learning is only one of those.
>>       If we couple the new GA library with any specific domain like ml it
>> would be meaningless for people working in other domains. They have to
>> incorporate the entire ml library which may be completely unrelated to
>> their project. Coupling it with any technology like spark might also limit
>> it's usability.
>>       If a separate component is not approved for this change then we can
>> incorporate the changes as part of *commons.math* library.
>>       The same library can be reused in ml or neural network libraries as
>> a dependency.
>>       Kindly share further views on this.
>> 
>> Thanks & Regards
>> --Avijit Basak
>> 
>> On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski <gillese...@gmail.com 
>> <mailto:gillese...@gmail.com>> wrote:
>> 
>>> Le mer. 10 févr. 2021 à 13:19, sebb <seb...@gmail.com 
>>> <mailto:seb...@gmail.com>> a écrit :
>>>> 
>>>> Likewise, commons-ml is too cryptic.
>>>> 
>>>> Also, the Spark project has a machine-learning library:
>>>> 
>>>> https://spark.apache.org/mllib/ <https://spark.apache.org/mllib/>
>>> 
>>> Thanks for the pointer.
>>> 
>>>> 
>>>> Maybe that would be better home?
>>> 
>>> On the face of it, probably.
>>> [For sure, Avijit should comment on the suggestion.]
>>> 
>>> On the other hand, "Commons" is the place where one can pick "bare
>>> bone" implementations, and add the functionality to one's application
>>> without necessarily comply with an overarching framework.
>>> [I don't mean that framework compliance is bad; quite the contrary, it is
>>> hopefully the result of a thorough reflection by experts.  But ... cf. the
>>> numerous "no-dependency" discussions ...]
>>> 
>>> Actually, concerning Avijit's proposed contribution, didn't I say:[1]
>>> ---CUT---
>>> Thus, I think that we must assess whether the "genetic algorithms"
>>> functionality has a reasonable future within "Apache Commons" (i.e.
>>> potential users and contributors) while there exist other libraries that
>>> seem much more advanced for any serious usage.
>>> ---CUT---
>>> 
>>>> I'm also a bit concerned as to whether there are sufficient developers
>>>> here with knowledge of the ML domain to be able to support the code in
>>>> the future.
>>> 
>>> An interesting point; by all means not a new one (see e.g. [2]).
>>> 
>>> Isn't it the same point I've been making about "Commons Math" (CM)?
>>> There has been no releases because nobody here is able (or is willing
>>> to) support it.
>>> 
>>> Concerning the support of the purported "machinelearning" component:
>>> 1. Package
>>>        org.apache.commons.math4.ml.neuralnet
>>>    * I've written it entirely and I have applications that depend on it
>>> (and I
>>>      cannot assume that I could easily switch to, or port it to, Spark),
>>> so I
>>>      can reasonably ensure that it would be supported.
>>> 2. Package
>>>        org.apache.commons.math4.ml.clustering
>>>    * Functionality is mentioned in Spark's "mllib" user guide.
>>>    * When a new feature was last contributed[3], it was noticed[4][5][6]
>>>      that improvement were needed (but there was no follow-up).
>>>    * I've an application that depend on it (from CM v3.6.1) but I wouldn't
>>>      support it if shipped in CM v4.0.
>>> 3. Package
>>>        org.apache.commons.math4.genetics
>>>    * Part of my "end-of-study" project consisted in a GA implementation.
>>>      I've never used the CM implementation, and I don't deny that there
>>>      could be perfectly fine uses of it but, just looking at the code, it
>>> seems
>>>      obvious that it cannot compete feature-wise with other libraries
>>> out there.
>>>    * I've suggested long ago that, without anyone supporting it actively
>>> (and
>>>      no known user community), it should be dropped from CM.
>>>    * Avijit expressed a willingness to improve the functionality:  Is
>>> this enough
>>>      for the PMC to create a new component?  From the experience with the
>>>      "clustering" package mentioned above, I'd tend to think
>>> (unfortunately)
>>>      that it isn't.  He should first explore whether the Spark community
>>> is
>>>      interested, that the GA functionality be moved over there.
>>> 
>>> Gilles
>>> 
>>> [1] https://issues.apache.org/jira/browse/MATH-1563 
>>> <https://issues.apache.org/jira/browse/MATH-1563>
>>> [2] https://markmail.org/message/26yxj5vhysdsoety 
>>> <https://markmail.org/message/26yxj5vhysdsoety>
>>> [3] https://issues.apache.org/jira/projects/MATH/issues/MATH-1509 
>>> <https://issues.apache.org/jira/projects/MATH/issues/MATH-1509>
>>> [4] https://issues.apache.org/jira/projects/MATH/issues/MATH-1524 
>>> <https://issues.apache.org/jira/projects/MATH/issues/MATH-1524>
>>> [5] https://issues.apache.org/jira/projects/MATH/issues/MATH-1528 
>>> <https://issues.apache.org/jira/projects/MATH/issues/MATH-1528>
>>> [6] https://issues.apache.org/jira/projects/MATH/issues/MATH-1526 
>>> <https://issues.apache.org/jira/projects/MATH/issues/MATH-1526>
>>> 
>>>> 
>>>> On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg <ebo...@apache.org 
>>>> <mailto:ebo...@apache.org>> wrote:
>>>>> 
>>>>> -1 for commons-ml for the same reasons.
>>>>> 
>>>>> What about commons-machine-learning or commons-math-learning? The
>>> latter
>>>>> is as long as commons-configuration.
>>>>> 
>>>>> Emmanuel Bourg
>>>>> 
>>>>> 
>>>>> Le 2021-02-10 03:27, Ralph Goers a écrit :
>>>>>> -1 on commons-ml as the name. My first thought is such a repo would
>>>>>> hold stuff related to mailing lists. Then again maybe it contains
>>>>>> stuff relating to markup languages. Maybe it is Apache’s version of
>>>>>> the ML Programming Language [1].
>>>>>> 
>>>>>> However, I wouldn’t be -1 on commons-math-ml, although at best I
>>> would
>>>>>> be +0 since it is still not obvious what it would contain.
>>>>>> 
>>>>>> Ralph
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>> 
>>> 
>> 
>> --
>> Avijit Basak
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org 
> <mailto:dev-unsubscr...@commons.apache.org>
> For additional commands, e-mail: dev-h...@commons.apache.org 
> <mailto:dev-h...@commons.apache.org>

Re: [Vote] Create a "machine learning" component

Reply via email to