Re: [Math] Commons Math (r)evolution

Gilles Thu, 09 Jun 2016 14:13:28 -0700

Hello Jörg.

On Thu, 09 Jun 2016 09:43:06 +0200, Jörg Schaible wrote:

Hi Gilles,


Gilles wrote:

Hi.

On Wed, 8 Jun 2016 23:50:00 +0300, Artem Barger wrote:

On Wed, Jun 8, 2016 at 12:25 AM, Gilles
<gil...@harfang.homelinux.org>
wrote:


According to JIRA, among 180 issues currently targeted for the

next major release (v4.0), 139 have been resolved (75 of which
were not in v3.6.1).

Huh, it's above of 75% completion :)


Everybody is welcome to review the "open" issues and comment
about them.

I guess someone need to prioritize them according to they
importance for
release.


Importance is relative... :-}

IMO, it is important to not release unsupported code.


Unit test *are* kind of support.


Unit tests are not what I mean by "support".  They only increase the
probability that the code behaves as expected. [And sometimes they do
not because they can be buggy too, as I discovered when refactoring
the "random" package.]

But anyways, my reservations have nothing to do with the functionality
of released code: users who are satisfied with the service provided by

v3.6.1 (or any of the previous versions of CM) have no reason toupgrade

to 4.0.  [By upgrading, all they get is the obligation to change the
"import" statements.]

And we have no reason to release a v4.0 of a code that
 1. has not changed
 2. is not supported

So the priority would be higher for issues that would be included
in the release of the new Commons components.
Hence the need to figure out what these components will be.

Of course, anyone who wishes to maintain some of these codes

(answer user questions, fix bugs, create enhancements, etc.)
is most welcome to step forward.

I can try to cover some of these and maintain relevant code
parts.


Which ones?

I will look into JIRA and provide the issue numbers, and of courseI

can cover and assist with ML part and particular clustering.


Thanks.


IMO, a maintainer is someone who is able to respond to user
questions and to figure out whether a bug report is valid.


I'm subscribed for mailing list for quite a while and haven't
seen a lot of questions coming from users.


The "user" ML has always been fairly quiet.
Does it mean that the code is really easy to use?
Or feature-complete (I doubt that)?
Or that there are very few users for the most complex features?

The "dev" ML was usually (much) more active.

The point is that when someone asks a question or propose an
contribution, there must be someone to answer.

And this is IMHO a wrong assumption. We have a lot of componentswhere the

original authors have left long ago. So the situation is not new.


Having no support is bad (IMO).
[It doesn't have to be from the original authors of course.]

Math is a specialized library and nobody expects that it isaccompanied bytutorials explaining the theory or developers that act as trainershere onthe lists. Users of special algorithms are supposed to be expertsthemselves
and should understand what they are doing. Or do you expect that any
arbitrary user can use genetic algorithms or neuronal network stuffwithout
the mathematical background?


No, I do not expect that.
[Although it is sometimes part of the resolution of a bug report, and
something that gives a sense of "you are welcome here".]

The main point is about real bugs that won't be handled (see below).

Anything is well and can be released as long as the existing code is
verified by unit tests. Otherwise we would have to remove a lot ofcodeevery time we release a component ... or do you expect e.g. that therelease
manager of vfs understands completely any of its providers?


No, certainly not, since I could RM CM. ;-)

But that's not the point!

_Some_ developer(s) should be able to support whatever is indevelopment.

Otherwise how can it be deemed "in development"?

Just today, two issues were reported on JIRA:
  https://issues.apache.org/jira/browse/MATH-172
  https://issues.apache.org/jira/browse/MATH-1375

They, unfortunately, illustrate my point.

Moreover what could be true for VFS is not for CM where there are many,
many different areas that have nothing in common (except perhaps some
ubiquitous very-low utilities which might be worth their own component
to serve as a, maybe "internal", dependency).

Also, compare the source basic statistics (lines of code):
              VFS      CM
Java code    24215   90834
Unit tests    8926   95595

All in all, CM is more than 5 times larger than VFS (not even counting
documentation).

I think that clustering part could be generalized to ML package
as a
whole.
Fine I guess, since currently the "neuralnet" sub-package's only
concrete functionality is also a clustering method.
I was also wondering whenever ML package meant to be extended in
the future
Really there was no plan, or as many plans as there weredevelopers...
Putting all these codes (with different designs, different coding
practices, different intended audiences, different levels ofexpertise,
etc.) in a single library was not sustainable.

That's why I strongly favour cutting this monolith into pieces
with a limited scope.
Nobody objects, but if you look at vfs, it is still *one* ApacheCommonscomponent, just with multiple artifacts. All these artifacts arereleased
*together*.


Sorry I'm lost, I looked there:
  http://commons.apache.org/proper/commons-vfs/download_vfs.cgi

And, it seems that all the functionality is in a single JAR.
[Other files contain the sources, tests, examples.]

Anyways, it is obvious that, in VFS, there is a well defined scope
(a unifying rationale).

No such thing in CM.

What I want to achieve is indeed to create a set of components that are
more like VFS!

This is particularly obvious with the RNGs where there is one unifying
interface, a factory method and multiple implementations.
[Of course, in that case, the new component will be much simpler than
VFS (which is a "good thing", isn't it?).]

Turning math into a multi-project has nothing to do with your
plans to drop mature code,


I am not dropping anything (others did that); I am stating facts and I
now want to spend my time on something (hopefully) worth it.  [Working
to modularize unsupported code is a (huge) waste of time.]

Also, in the case of CM, "mature code" is meaningless as an overall
qualifier: some codes are
 * new (and never released, e.g. 64-bits-based RNGs)
 * algorithms introduced relatively recently (and perhaps never used)
 * old (and sometimes outdated and impossible to fix without breaking
   compatibility)
 * mostly functional (but impossible to maintain, cf. MATH-1375)
 * resulting from a refactoring (hence even when the functionality has
   existed for a long time, the code is not "mature")

IMHO, maturity should be visible in the code.  It's an impression that

builds up by looking at the code as a whole, and coming to theconclusionthat indeed there is some overall consistency across files andpackages.


Within some CM packages: yes (even if "mature" would certainly not mean
free of sometimes serious problems).

Across the whole library: certainly *not*.
[For reasons I could expand on.  But I did several times (cf. archives)
without succeeding in changing course.]

because you (and currently no-one else) cannot
answer questions to its functionality.

See the first post in this thread, in the part about graduallyre-adding

codes if and when they are supported by a new team.


Regards,
Gilles

Cheers,
Jörg



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Math] Commons Math (r)evolution

Reply via email to