Hi Sorry for the delayed response. Thanks for your patience. Please find my comments below:
(1) Why not Spark? [At least post over there (?).] --We can move to Spark. But it will be very much useful if the things can also run without Spark. The use of Spark would make more sense in a production environment. But the portability of the library will be more useful for the non-prod environment. Definitely, we can reach the Spark team and query. (2) Further develop a monolithic CM? [Who will do it?] --I can help with the upgrade of the existing library related to GA functionality. (3) Modularize CM? [Who will do it?] --I can help with the upgrade of the existing library related to GA functionality. (4) New component (with another name) with the proposed contents? --This is the best option if permitted. The code which I have written can be reused with minor modifications. So it won't take too much effort for this activity. Kindly share further thoughts. Thanks & Regards --Avijit Basak On Sun, 14 Feb 2021 at 19:56, Gilles Sadowski <gillese...@gmail.com> wrote: > Le dim. 14 févr. 2021 à 09:06, Avijit Basak <avijit.ba...@gmail.com> a > écrit : > > > > Hi > > > > I would like to mention a few points here. Genetic Algorithm has a > > vast range of applications in optimization and search problems. Machine > > learning is only one of those. > > If we couple the new GA library with any specific domain like ml > it > > would be meaningless for people working in other domains. > > Isn't "meaningless" a slight overstatement? > We might have an issue of terminology: There is no necessary "coupling" > but maybe "acquaintance" (for lack of a better word), as a set of tools > that > might come in handy for solving certain types of problems. [For example, > the Traveling Salesman Problem can be tackled by GA and SOFM, both > of which are candidate for inclusion in the new component, although they > don't share any code.] > > If the name "machine learning" is not the most appropriate one to convey > the intended scope, do you have another idea? > ["AI" would perhaps be more correct if we consider a strict hierarchy, but > would obviously be far too presumptuous.] > > > They have to > > incorporate the entire ml library > > No, they won't. Given the stated goal of "modularity": the "ga" module > will be available as a dedicated JAR (possibly with a dependency to > codes that can be reused in other modules provided by the component). > > > which may be completely unrelated to > > their project. Coupling it with any technology like spark might also > limit > > it's usability. > > You may be right; I have no idea about the "restrictions" imposed by > Spark. [It seems that in this case, one would have to indeed depend > on Spark's "mllib" (?). This would be one reason, as I already stated, > for having something in "Commons".] > > Could you elaborate on a concrete use-case where one would be > starting to develop an application with the specific requirement that > Spark could not be used? > In particular, IIRC Spark has multi-threading built in. Don't you see > it as a huge problem that CM would not provide such a feature? > > > If a separate component is not approved for this change then we > can > > incorporate the changes as part of *commons.math* library. > > Of course, if somebody wants to do that, he's welcome. > [That will not be me, for all the reasons which I've explained. In the > last > 5 years I've been pretty much alone in handling bug reports about CM; > I'm unwilling to assume implicit support for even more codes.] > > Also, with this solution, you'd now be willing to accept what you weren't > above: Anyone wanting to use the GA functionality would indeed have to > "incorporate" the whole of "Commons Math" (CM). > Of course, the latter could be modularized, but this will only mitigate the > issue, as any release of the GA functionality will potentially be then held > off by potential issues in other parts of CM (which nobody has been able > to consistently support for more than 5 years now). > > > The same library can be reused in ml or neural network libraries > as > > a dependency. > > It is the other way around: The development version of CM currently > depends on "lower-level" components. > Furthermore, right now its (embryonic) "machine learning" functionality > hasn't any substantial dependency on codes outside the "o.a.c.math4.ml" > package. > > > Kindly share further views on this. > > In summary, to be clarified: > (1) Why not Spark? [At least post over there (?).] > (2) Further develop a monolithic CM? [Who will do it?] > (3) Modularize CM? [Who will do it?] > (4) New component (with another name) with the proposed contents? > > To make things clear from my side: As a *user*, I've currently some > stake at having a clean, independent "ml" component or an independent > "sofm" module. So I could do (4). Or help with (3), on the condition that > *other* people get things moving. > > Regards, > Gilles > > > > > Thanks & Regards > > --Avijit Basak > > > > On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski <gillese...@gmail.com> > wrote: > > > > > Le mer. 10 févr. 2021 à 13:19, sebb <seb...@gmail.com> a écrit : > > > > > > > > Likewise, commons-ml is too cryptic. > > > > > > > > Also, the Spark project has a machine-learning library: > > > > > > > > https://spark.apache.org/mllib/ > > > > > > Thanks for the pointer. > > > > > > > > > > > Maybe that would be better home? > > > > > > On the face of it, probably. > > > [For sure, Avijit should comment on the suggestion.] > > > > > > On the other hand, "Commons" is the place where one can pick "bare > > > bone" implementations, and add the functionality to one's application > > > without necessarily comply with an overarching framework. > > > [I don't mean that framework compliance is bad; quite the contrary, it > is > > > hopefully the result of a thorough reflection by experts. But ... cf. > the > > > numerous "no-dependency" discussions ...] > > > > > > Actually, concerning Avijit's proposed contribution, didn't I say:[1] > > > ---CUT--- > > > Thus, I think that we must assess whether the "genetic algorithms" > > > functionality has a reasonable future within "Apache Commons" (i.e. > > > potential users and contributors) while there exist other libraries > that > > > seem much more advanced for any serious usage. > > > ---CUT--- > > > > > > > I'm also a bit concerned as to whether there are sufficient > developers > > > > here with knowledge of the ML domain to be able to support the code > in > > > > the future. > > > > > > An interesting point; by all means not a new one (see e.g. [2]). > > > > > > Isn't it the same point I've been making about "Commons Math" (CM)? > > > There has been no releases because nobody here is able (or is willing > > > to) support it. > > > > > > Concerning the support of the purported "machinelearning" component: > > > 1. Package > > > org.apache.commons.math4.ml.neuralnet > > > * I've written it entirely and I have applications that depend on > it > > > (and I > > > cannot assume that I could easily switch to, or port it to, > Spark), > > > so I > > > can reasonably ensure that it would be supported. > > > 2. Package > > > org.apache.commons.math4.ml.clustering > > > * Functionality is mentioned in Spark's "mllib" user guide. > > > * When a new feature was last contributed[3], it was > noticed[4][5][6] > > > that improvement were needed (but there was no follow-up). > > > * I've an application that depend on it (from CM v3.6.1) but I > wouldn't > > > support it if shipped in CM v4.0. > > > 3. Package > > > org.apache.commons.math4.genetics > > > * Part of my "end-of-study" project consisted in a GA > implementation. > > > I've never used the CM implementation, and I don't deny that > there > > > could be perfectly fine uses of it but, just looking at the > code, it > > > seems > > > obvious that it cannot compete feature-wise with other libraries > > > out there. > > > * I've suggested long ago that, without anyone supporting it > actively > > > (and > > > no known user community), it should be dropped from CM. > > > * Avijit expressed a willingness to improve the functionality: Is > > > this enough > > > for the PMC to create a new component? From the experience with > the > > > "clustering" package mentioned above, I'd tend to think > > > (unfortunately) > > > that it isn't. He should first explore whether the Spark > community > > > is > > > interested, that the GA functionality be moved over there. > > > > > > Gilles > > > > > > [1] https://issues.apache.org/jira/browse/MATH-1563 > > > [2] https://markmail.org/message/26yxj5vhysdsoety > > > [3] https://issues.apache.org/jira/projects/MATH/issues/MATH-1509 > > > [4] https://issues.apache.org/jira/projects/MATH/issues/MATH-1524 > > > [5] https://issues.apache.org/jira/projects/MATH/issues/MATH-1528 > > > [6] https://issues.apache.org/jira/projects/MATH/issues/MATH-1526 > > > > > > > > > > > On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg <ebo...@apache.org> > wrote: > > > > > > > > > > -1 for commons-ml for the same reasons. > > > > > > > > > > What about commons-machine-learning or commons-math-learning? The > > > latter > > > > > is as long as commons-configuration. > > > > > > > > > > Emmanuel Bourg > > > > > > > > > > > > > > > Le 2021-02-10 03:27, Ralph Goers a écrit : > > > > > > -1 on commons-ml as the name. My first thought is such a repo > would > > > > > > hold stuff related to mailing lists. Then again maybe it contains > > > > > > stuff relating to markup languages. Maybe it is Apache’s version > of > > > > > > the ML Programming Language [1]. > > > > > > > > > > > > However, I wouldn’t be -1 on commons-math-ml, although at best I > > > would > > > > > > be +0 since it is still not obvious what it would contain. > > > > > > > > > > > > Ralph > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > > > > > -- > > Avijit Basak > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > -- Avijit Basak