On Wed, 06 Feb 2013 07:19:47 -0800, Phil Steitz wrote:
On 2/5/13 6:08 AM, Gilles wrote:
Hi.
In the thread about "static import", Stephen noted that decisions
on a
component's evolution are dependent on whether the future of the
Java
language is taken into account, or not.
A question on the same theme also arose after the presentation of
Commons
Math in FOSDEM 2013.
If we assume that efficiency is among the important qualities for
Commons
Math, the future is to allow usage of the tools provided by the
standard
Java library in order to ease the development of multi-threaded
algorithms.
Maintaining Java 1.5 source compatibility for the reason that we
may need
to support legacy applications will turn out to be self-defeating:
1. New users will not consider Commons Math's features that are
notably
apt to parallel processing.
2. Current users might at some point simply switch to another
library if
it proves more efficient (because it actually uses
multi-threading).
3. New Java developers will be turned away because they will want
to use
the more convenient features of the language in order to provide
potential contributions.
If maintaining 1.5 source compatibility is kept as a requirement,
the
consequence is that Commons Math will _become_ a legacy library.
In that perspective, implementing/improving algorithms for which a
parallel version is known to be more efficient is plainly a waste of
development and maintenance time.
In order to mitigate the risks (both of upgrading and of not
upgrading
the source compatibility requirement), I would propose to create a
new
project (say, "Commons Math MT") where we could implement new
features[1]
without being encumbered with the 1.5 requirement.[2]
The "Commons Math MT" would depend on "Commons Math" where we would
continue developing single-thread (and thread-safe) "tasks", i.e.
independent units of processing that could be used in algorithms
located in "Commons Math MT".
In summary:
- Commons Math (as usual):
* single-thread (sequential) algorithms,
* (pure) Java 5,
* no dependencies.
- Commons Math MT:
* multi-thread (parallel) algorithms,
* Java 7 and beyond,
* JNI allowed,
* dependencies allowed (jCuda).
What do you think?
There are several other possibilities to consider:
0) Implement multithreading using JDK 1.5 primitives
1) Set things up within [math] to support parallel execution in JDK
1.7, Hadoop or other frameworks
2) Instead of a new project, start a 4.x branch targeting JDK 1.7
I think we should maintain a version that has no dependencies and no
JNI in any case.
Starting a branch and getting concrete about how to parallelize some
algorithms would be a good way to start. One thing I have not
really investigated and would be interested in details on is what
you actually get in efficiency gain (or loss?) using fork / join vs
just using 1.5+ concurrency for the kinds of problems we would end
up using this stuff for.
Thinking about specific parallelization problem instances would also
help decide whether 1) makes sense (i.e., whether it makes sense as
you mention above to maintain a single-threaded library that
provides task execution for a multithreaded version or multithreaded
frameworks).
One more thing to consider is that for at least some users of
[math], having the library internally spawn threads and/or peg
multiple processors may not be desirable. It is a little misleading
to say that multithreading is the way to get "efficiency." It is
really the way to *use* more compute resources and unless there are
real algorithmic improvements, the overall efficiency may actually
be less, due to task coordination overhead. What you get is faster
execution due to more greedy utilization of available cores. Actual
efficiency (how much overall compute resource it takes to complete a
job) partly depends on how efficiently the coordination itself is
done (which JDK 1.7 claims to do very well - I have just not seen
substantiation or any benchmarks demonstrating this) and how the
parallelization effects overall compute requirements. In any case,
for environments where library thread-spawning is not desirable, I
think we should maintain a single-threaded version.
Unless I missed the point, those reasons are exactly why I propose to
have 2 projects/components. One, "Commons-Math", does not fiddle with
resources, while the other would provide a "parallelizationLevel"
setting for the algorithms written to possibly take advantage of the
Java 5+ "task framework".
Yes, we could still be good by using only Java 5's concurrency features
but the issue I raise is not only about concurrency but about
evolution/progress/maintenance, all things that require raising
interest
from new contributors (unless it's fine that Commons Math be tagged as
a
"library of the past"...).
But using concurrency features in "Commons Math" would also contradict
your own point ("we should maintain a single-threaded version"): I
agree,
and that's why I proposed this other project...
As for efficiency (or faster execution, if you want), I don't see the
point in doubting that tasks like global search (e.g. in a genetic
algorithm) will complete in less time when run in parallel...
As I summarized previously, having a "Commons Math MT" would bring no
inconvenience, contrary to either your points 0, 1, or 2. [No
inconvenience to me, that is, but to people with requirements like
"Java 5 compatible" or "no multi-threading").
As I indicated, the basic "task" could be defined in "Commons Math" and
"Commons Math MT" would provide the parallelization "glue" (e.g. to
divide
the search space of the GA).
Gilles
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org