On 2/6/13 9:03 AM, Gilles wrote: > On Wed, 06 Feb 2013 07:19:47 -0800, Phil Steitz wrote: >> On 2/5/13 6:08 AM, Gilles wrote: >>> Hi. >>> >>> In the thread about "static import", Stephen noted that decisions >>> on a >>> component's evolution are dependent on whether the future of the >>> Java >>> language is taken into account, or not. >>> A question on the same theme also arose after the presentation of >>> Commons >>> Math in FOSDEM 2013. >>> >>> If we assume that efficiency is among the important qualities for >>> Commons >>> Math, the future is to allow usage of the tools provided by the >>> standard >>> Java library in order to ease the development of multi-threaded >>> algorithms. >>> >>> Maintaining Java 1.5 source compatibility for the reason that we >>> may need >>> to support legacy applications will turn out to be self-defeating: >>> 1. New users will not consider Commons Math's features that are >>> notably >>> apt to parallel processing. >>> 2. Current users might at some point simply switch to another >>> library if >>> it proves more efficient (because it actually uses >>> multi-threading). >>> 3. New Java developers will be turned away because they will want >>> to use >>> the more convenient features of the language in order to provide >>> potential contributions. >>> >>> If maintaining 1.5 source compatibility is kept as a >>> requirement, the >>> consequence is that Commons Math will _become_ a legacy library. >>> In that perspective, implementing/improving algorithms for which a >>> parallel version is known to be more efficient is plainly a >>> waste of >>> development and maintenance time. >>> >>> In order to mitigate the risks (both of upgrading and of not >>> upgrading >>> the source compatibility requirement), I would propose to create a >>> new >>> project (say, "Commons Math MT") where we could implement new >>> features[1] >>> without being encumbered with the 1.5 requirement.[2] >>> The "Commons Math MT" would depend on "Commons Math" where we would >>> continue developing single-thread (and thread-safe) "tasks", i.e. >>> independent units of processing that could be used in algorithms >>> located in "Commons Math MT". >>> >>> In summary: >>> - Commons Math (as usual): >>> * single-thread (sequential) algorithms, >>> * (pure) Java 5, >>> * no dependencies. >>> - Commons Math MT: >>> * multi-thread (parallel) algorithms, >>> * Java 7 and beyond, >>> * JNI allowed, >>> * dependencies allowed (jCuda). >>> >>> What do you think? >> >> There are several other possibilities to consider: >> >> 0) Implement multithreading using JDK 1.5 primitives >> 1) Set things up within [math] to support parallel execution in JDK >> 1.7, Hadoop or other frameworks >> 2) Instead of a new project, start a 4.x branch targeting JDK 1.7 >> >> I think we should maintain a version that has no dependencies and no >> JNI in any case. >> >> Starting a branch and getting concrete about how to parallelize some >> algorithms would be a good way to start. One thing I have not >> really investigated and would be interested in details on is what >> you actually get in efficiency gain (or loss?) using fork / join vs >> just using 1.5+ concurrency for the kinds of problems we would end >> up using this stuff for. >> >> Thinking about specific parallelization problem instances would also >> help decide whether 1) makes sense (i.e., whether it makes sense as >> you mention above to maintain a single-threaded library that >> provides task execution for a multithreaded version or multithreaded >> frameworks). >> >> One more thing to consider is that for at least some users of >> [math], having the library internally spawn threads and/or peg >> multiple processors may not be desirable. It is a little misleading >> to say that multithreading is the way to get "efficiency." It is >> really the way to *use* more compute resources and unless there are >> real algorithmic improvements, the overall efficiency may actually >> be less, due to task coordination overhead. What you get is faster >> execution due to more greedy utilization of available cores. Actual >> efficiency (how much overall compute resource it takes to complete a >> job) partly depends on how efficiently the coordination itself is >> done (which JDK 1.7 claims to do very well - I have just not seen >> substantiation or any benchmarks demonstrating this) and how the >> parallelization effects overall compute requirements. In any case, >> for environments where library thread-spawning is not desirable, I >> think we should maintain a single-threaded version. >> > > Unless I missed the point, those reasons are exactly why I propose to > have 2 projects/components. One, "Commons-Math", does not fiddle with > resources, while the other would provide a "parallelizationLevel" > setting for the algorithms written to possibly take advantage of the > Java 5+ "task framework".
OK, what about the 4.x option? > > Yes, we could still be good by using only Java 5's concurrency > features > but the issue I raise is not only about concurrency but about > evolution/progress/maintenance, all things that require raising > interest > from new contributors (unless it's fine that Commons Math be > tagged as a > "library of the past"...). +1 for experimenting with parallelization. I would just like to understand if the JDK 7 stuff really adds much - in particular, does it handle coordination / cpu allocation better than you could easily do it with 1.5. More supported JDKs == more potential users, so I like to see a real reason to bump the JDK level. > > But using concurrency features in "Commons Math" would also > contradict > your own point ("we should maintain a single-threaded version"): I > agree, > and that's why I proposed this other project... > > As for efficiency (or faster execution, if you want), I don't see the > point in doubting that tasks like global search (e.g. in a genetic > algorithm) will complete in less time when run in parallel... > > As I summarized previously, having a "Commons Math MT" would bring no > inconvenience, contrary to either your points 0, 1, or 2. [No > inconvenience to me, that is, but to people with requirements like > "Java 5 compatible" or "no multi-threading"). > As I indicated, the basic "task" could be defined in "Commons > Math" and > "Commons Math MT" would provide the parallelization "glue" (e.g. > to divide > the search space of the GA). I think it is best at this point to cut a branch and actually start working on specific algorithms. Having a set of candidate algorithms for parallelization will help us decide what we actually need and how it might work. I would personally favor the 4.x approach, with thread-spawning behavior configurable. Phil > > > Gilles > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org