Hi

         Sorry for the delayed response. Thanks for your patience. Please
find my comments below:

 (1) Why not Spark?  [At least post over there (?).]
      --We can move to Spark. But it will be very much useful if the things
can also run without Spark. The use of Spark would make more sense in a
production environment. But the portability of the library will be more
useful for the non-prod environment. Definitely, we can reach the Spark
team and query.
 (2) Further develop a monolithic CM?  [Who will do it?]
       --I can help with the upgrade of the existing library related to GA
functionality.
 (3) Modularize CM? [Who will do it?]
       --I can help with the upgrade of the existing library related to GA
functionality.
 (4) New component (with another name) with the proposed contents?
       --This is the best option if permitted.

      The code which I have written can be reused with minor modifications.
So it won't take too much effort for this activity.
      Kindly share further thoughts.

Thanks & Regards
--Avijit Basak


On Sun, 14 Feb 2021 at 19:56, Gilles Sadowski <gillese...@gmail.com> wrote:

> Le dim. 14 févr. 2021 à 09:06, Avijit Basak <avijit.ba...@gmail.com> a
> écrit :
> >
> > Hi
> >
> >        I would like to mention a few points here. Genetic Algorithm has a
> > vast range of applications in optimization and search problems. Machine
> > learning is only one of those.
> >        If we couple the new GA library with any specific domain like ml
> it
> > would be meaningless for people working in other domains.
>
> Isn't "meaningless" a slight overstatement?
> We might have an issue of terminology: There is no necessary "coupling"
> but maybe "acquaintance" (for lack of a better word), as a set of tools
> that
> might come in handy for solving certain types of problems.  [For example,
> the Traveling Salesman Problem can be tackled by GA and SOFM, both
> of which are candidate for inclusion in the new component, although they
> don't share any code.]
>
> If the name "machine learning" is not the most appropriate one to convey
> the intended scope, do you have another idea?
> ["AI" would perhaps be more correct if we consider a strict hierarchy, but
> would obviously be far too presumptuous.]
>
> > They have to
> > incorporate the entire ml library
>
> No, they won't.  Given the stated goal of "modularity": the "ga" module
> will be available as a dedicated JAR (possibly with a dependency to
> codes that can be reused in other modules provided by the component).
>
> > which may be completely unrelated to
> > their project. Coupling it with any technology like spark might also
> limit
> > it's usability.
>
> You may be right; I have no idea about the "restrictions" imposed by
> Spark.  [It seems that in this case, one would have to indeed depend
> on Spark's "mllib" (?).  This would be one reason, as I already stated,
> for having something in "Commons".]
>
> Could you elaborate on a concrete use-case where one would be
> starting to develop an application with the specific requirement that
> Spark could not be used?
> In particular, IIRC Spark has multi-threading built in.  Don't you see
> it as a huge problem that CM would not provide such a feature?
>
> >        If a separate component is not approved for this change then we
> can
> > incorporate the changes as part of *commons.math* library.
>
> Of course, if somebody wants to do that, he's welcome.
> [That will not be me, for all the reasons which I've explained.  In the
> last
> 5 years I've been pretty much alone in handling bug reports about CM;
> I'm unwilling to assume implicit support for even more codes.]
>
> Also, with this solution, you'd now be willing to accept what you weren't
> above: Anyone wanting to use the GA functionality would indeed have to
> "incorporate" the whole of "Commons Math" (CM).
> Of course, the latter could be modularized, but this will only mitigate the
> issue, as any release of the GA functionality will potentially be then held
> off by potential issues in other parts of CM (which nobody has been able
> to consistently support for more than 5 years now).
>
> >        The same library can be reused in ml or neural network libraries
> as
> > a dependency.
>
> It is the other way around:  The development version of CM currently
> depends on "lower-level" components.
> Furthermore, right now its (embryonic) "machine learning" functionality
> hasn't any substantial dependency on codes outside the "o.a.c.math4.ml"
> package.
>
> >        Kindly share further views on this.
>
> In summary, to be clarified:
>  (1) Why not Spark?  [At least post over there (?).]
>  (2) Further develop a monolithic CM?  [Who will do it?]
>  (3) Modularize CM? [Who will do it?]
>  (4) New component (with another name) with the proposed contents?
>
> To make things clear from my side:  As a *user*, I've currently some
> stake at having a clean, independent "ml" component or an independent
> "sofm" module.  So I could do (4).  Or help with (3), on the condition that
> *other* people get things moving.
>
> Regards,
> Gilles
>
> >
> > Thanks & Regards
> > --Avijit Basak
> >
> > On Wed, 10 Feb 2021 at 19:49, Gilles Sadowski <gillese...@gmail.com>
> wrote:
> >
> > > Le mer. 10 févr. 2021 à 13:19, sebb <seb...@gmail.com> a écrit :
> > > >
> > > > Likewise, commons-ml is too cryptic.
> > > >
> > > > Also, the Spark project has a machine-learning library:
> > > >
> > > > https://spark.apache.org/mllib/
> > >
> > > Thanks for the pointer.
> > >
> > > >
> > > > Maybe that would be better home?
> > >
> > > On the face of it, probably.
> > > [For sure, Avijit should comment on the suggestion.]
> > >
> > > On the other hand, "Commons" is the place where one can pick "bare
> > > bone" implementations, and add the functionality to one's application
> > > without necessarily comply with an overarching framework.
> > > [I don't mean that framework compliance is bad; quite the contrary, it
> is
> > > hopefully the result of a thorough reflection by experts.  But ... cf.
> the
> > > numerous "no-dependency" discussions ...]
> > >
> > > Actually, concerning Avijit's proposed contribution, didn't I say:[1]
> > > ---CUT---
> > > Thus, I think that we must assess whether the "genetic algorithms"
> > > functionality has a reasonable future within "Apache Commons" (i.e.
> > > potential users and contributors) while there exist other libraries
> that
> > > seem much more advanced for any serious usage.
> > > ---CUT---
> > >
> > > > I'm also a bit concerned as to whether there are sufficient
> developers
> > > > here with knowledge of the ML domain to be able to support the code
> in
> > > > the future.
> > >
> > > An interesting point; by all means not a new one (see e.g. [2]).
> > >
> > > Isn't it the same point I've been making about "Commons Math" (CM)?
> > > There has been no releases because nobody here is able (or is willing
> > > to) support it.
> > >
> > > Concerning the support of the purported "machinelearning" component:
> > > 1. Package
> > >         org.apache.commons.math4.ml.neuralnet
> > >     * I've written it entirely and I have applications that depend on
> it
> > > (and I
> > >       cannot assume that I could easily switch to, or port it to,
> Spark),
> > > so I
> > >       can reasonably ensure that it would be supported.
> > > 2. Package
> > >         org.apache.commons.math4.ml.clustering
> > >     * Functionality is mentioned in Spark's "mllib" user guide.
> > >     * When a new feature was last contributed[3], it was
> noticed[4][5][6]
> > >       that improvement were needed (but there was no follow-up).
> > >     * I've an application that depend on it (from CM v3.6.1) but I
> wouldn't
> > >       support it if shipped in CM v4.0.
> > > 3. Package
> > >         org.apache.commons.math4.genetics
> > >     * Part of my "end-of-study" project consisted in a GA
> implementation.
> > >       I've never used the CM implementation, and I don't deny that
> there
> > >       could be perfectly fine uses of it but, just looking at the
> code, it
> > > seems
> > >       obvious that it cannot compete feature-wise with other libraries
> > > out there.
> > >     * I've suggested long ago that, without anyone supporting it
> actively
> > > (and
> > >       no known user community), it should be dropped from CM.
> > >     * Avijit expressed a willingness to improve the functionality:  Is
> > > this enough
> > >       for the PMC to create a new component?  From the experience with
> the
> > >       "clustering" package mentioned above, I'd tend to think
> > > (unfortunately)
> > >       that it isn't.  He should first explore whether the Spark
> community
> > > is
> > >       interested, that the GA functionality be moved over there.
> > >
> > > Gilles
> > >
> > > [1] https://issues.apache.org/jira/browse/MATH-1563
> > > [2] https://markmail.org/message/26yxj5vhysdsoety
> > > [3] https://issues.apache.org/jira/projects/MATH/issues/MATH-1509
> > > [4] https://issues.apache.org/jira/projects/MATH/issues/MATH-1524
> > > [5] https://issues.apache.org/jira/projects/MATH/issues/MATH-1528
> > > [6] https://issues.apache.org/jira/projects/MATH/issues/MATH-1526
> > >
> > > >
> > > > On Wed, 10 Feb 2021 at 08:27, Emmanuel Bourg <ebo...@apache.org>
> wrote:
> > > > >
> > > > > -1 for commons-ml for the same reasons.
> > > > >
> > > > > What about commons-machine-learning or commons-math-learning? The
> > > latter
> > > > > is as long as commons-configuration.
> > > > >
> > > > > Emmanuel Bourg
> > > > >
> > > > >
> > > > > Le 2021-02-10 03:27, Ralph Goers a écrit :
> > > > > > -1 on commons-ml as the name. My first thought is such a repo
> would
> > > > > > hold stuff related to mailing lists. Then again maybe it contains
> > > > > > stuff relating to markup languages. Maybe it is Apache’s version
> of
> > > > > > the ML Programming Language [1].
> > > > > >
> > > > > > However, I wouldn’t be -1 on commons-math-ml, although at best I
> > > would
> > > > > > be +0 since it is still not obvious what it would contain.
> > > > > >
> > > > > > Ralph
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > For additional commands, e-mail: dev-h...@commons.apache.org
> > >
> > >
> >
> > --
> > Avijit Basak
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

-- 
Avijit Basak

Reply via email to