On 2/7/13 8:04 AM, Gilles wrote: > On Thu, 07 Feb 2013 07:01:42 -0800, Phil Steitz wrote: >> On 2/7/13 4:58 AM, Gilles wrote: >>> On Wed, 06 Feb 2013 09:46:55 -0800, Phil Steitz wrote: >>>> On 2/6/13 9:03 AM, Gilles wrote: >>>>> On Wed, 06 Feb 2013 07:19:47 -0800, Phil Steitz wrote: >>>>>> On 2/5/13 6:08 AM, Gilles wrote: >>>>>>> Hi. >>>>>>> >>>>>>> In the thread about "static import", Stephen noted that >>>>>>> decisions >>>>>>> on a >>>>>>> component's evolution are dependent on whether the future of >>>>>>> the >>>>>>> Java >>>>>>> language is taken into account, or not. >>>>>>> A question on the same theme also arose after the >>>>>>> presentation of >>>>>>> Commons >>>>>>> Math in FOSDEM 2013. >>>>>>> >>>>>>> If we assume that efficiency is among the important >>>>>>> qualities for >>>>>>> Commons >>>>>>> Math, the future is to allow usage of the tools provided by the >>>>>>> standard >>>>>>> Java library in order to ease the development of multi-threaded >>>>>>> algorithms. >>>>>>> >>>>>>> Maintaining Java 1.5 source compatibility for the reason >>>>>>> that we >>>>>>> may need >>>>>>> to support legacy applications will turn out to be >>>>>>> self-defeating: >>>>>>> 1. New users will not consider Commons Math's features that are >>>>>>> notably >>>>>>> apt to parallel processing. >>>>>>> 2. Current users might at some point simply switch to another >>>>>>> library if >>>>>>> it proves more efficient (because it actually uses >>>>>>> multi-threading). >>>>>>> 3. New Java developers will be turned away because they will >>>>>>> want >>>>>>> to use >>>>>>> the more convenient features of the language in order to >>>>>>> provide >>>>>>> potential contributions. >>>>>>> >>>>>>> If maintaining 1.5 source compatibility is kept as a >>>>>>> requirement, the >>>>>>> consequence is that Commons Math will _become_ a legacy >>>>>>> library. >>>>>>> In that perspective, implementing/improving algorithms for >>>>>>> which a >>>>>>> parallel version is known to be more efficient is plainly a >>>>>>> waste of >>>>>>> development and maintenance time. >>>>>>> >>>>>>> In order to mitigate the risks (both of upgrading and of not >>>>>>> upgrading >>>>>>> the source compatibility requirement), I would propose to >>>>>>> create a >>>>>>> new >>>>>>> project (say, "Commons Math MT") where we could implement new >>>>>>> features[1] >>>>>>> without being encumbered with the 1.5 requirement.[2] >>>>>>> The "Commons Math MT" would depend on "Commons Math" where we >>>>>>> would >>>>>>> continue developing single-thread (and thread-safe) "tasks", >>>>>>> i.e. >>>>>>> independent units of processing that could be used in >>>>>>> algorithms >>>>>>> located in "Commons Math MT". >>>>>>> >>>>>>> In summary: >>>>>>> - Commons Math (as usual): >>>>>>> * single-thread (sequential) algorithms, >>>>>>> * (pure) Java 5, >>>>>>> * no dependencies. >>>>>>> - Commons Math MT: >>>>>>> * multi-thread (parallel) algorithms, >>>>>>> * Java 7 and beyond, >>>>>>> * JNI allowed, >>>>>>> * dependencies allowed (jCuda). >>>>>>> >>>>>>> What do you think? >>>>>> >>>>>> There are several other possibilities to consider: >>>>>> >>>>>> 0) Implement multithreading using JDK 1.5 primitives >>>>>> 1) Set things up within [math] to support parallel execution in >>>>>> JDK >>>>>> 1.7, Hadoop or other frameworks >>>>>> 2) Instead of a new project, start a 4.x branch targeting JDK >>>>>> 1.7 >>>>>> >>>>>> I think we should maintain a version that has no dependencies >>>>>> and no >>>>>> JNI in any case. >>>>>> >>>>>> Starting a branch and getting concrete about how to parallelize >>>>>> some >>>>>> algorithms would be a good way to start. One thing I have not >>>>>> really investigated and would be interested in details on is >>>>>> what >>>>>> you actually get in efficiency gain (or loss?) using fork / >>>>>> join vs >>>>>> just using 1.5+ concurrency for the kinds of problems we >>>>>> would end >>>>>> up using this stuff for. >>>>>> >>>>>> Thinking about specific parallelization problem instances would >>>>>> also >>>>>> help decide whether 1) makes sense (i.e., whether it makes >>>>>> sense as >>>>>> you mention above to maintain a single-threaded library that >>>>>> provides task execution for a multithreaded version or >>>>>> multithreaded >>>>>> frameworks). >>>>>> >>>>>> One more thing to consider is that for at least some users of >>>>>> [math], having the library internally spawn threads and/or peg >>>>>> multiple processors may not be desirable. It is a little >>>>>> misleading >>>>>> to say that multithreading is the way to get "efficiency." >>>>>> It is >>>>>> really the way to *use* more compute resources and unless there >>>>>> are >>>>>> real algorithmic improvements, the overall efficiency may >>>>>> actually >>>>>> be less, due to task coordination overhead. What you get is >>>>>> faster >>>>>> execution due to more greedy utilization of available cores. >>>>>> Actual >>>>>> efficiency (how much overall compute resource it takes to >>>>>> complete a >>>>>> job) partly depends on how efficiently the coordination >>>>>> itself is >>>>>> done (which JDK 1.7 claims to do very well - I have just not >>>>>> seen >>>>>> substantiation or any benchmarks demonstrating this) and how the >>>>>> parallelization effects overall compute requirements. In any >>>>>> case, >>>>>> for environments where library thread-spawning is not >>>>>> desirable, I >>>>>> think we should maintain a single-threaded version. >>>>>> >>>>> >>>>> Unless I missed the point, those reasons are exactly why I >>>>> propose to >>>>> have 2 projects/components. One, "Commons-Math", does not fiddle >>>>> with >>>>> resources, while the other would provide a "parallelizationLevel" >>>>> setting for the algorithms written to possibly take advantage of >>>>> the >>>>> Java 5+ "task framework". >>>> >>>> OK, what about the 4.x option? >>>>> >>>>> Yes, we could still be good by using only Java 5's concurrency >>>>> features >>>>> but the issue I raise is not only about concurrency but about >>>>> evolution/progress/maintenance, all things that require raising >>>>> interest >>>>> from new contributors (unless it's fine that Commons Math be >>>>> tagged as a >>>>> "library of the past"...). >>>> >>>> +1 for experimenting with parallelization. I would just like to >>>> understand if the JDK 7 stuff really adds much - in particular, >>>> does >>>> it handle coordination / cpu allocation better than you could >>>> easily >>>> do it with 1.5. More supported JDKs == more potential users, so I >>>> like to see a real reason to bump the JDK level. >>>>> >>>>> But using concurrency features in "Commons Math" would also >>>>> contradict >>>>> your own point ("we should maintain a single-threaded >>>>> version"): I >>>>> agree, >>>>> and that's why I proposed this other project... >>>>> >>>>> As for efficiency (or faster execution, if you want), I don't >>>>> see the >>>>> point in doubting that tasks like global search (e.g. in a >>>>> genetic >>>>> algorithm) will complete in less time when run in parallel... >>>>> >>>>> As I summarized previously, having a "Commons Math MT" would >>>>> bring no >>>>> inconvenience, contrary to either your points 0, 1, or 2. [No >>>>> inconvenience to me, that is, but to people with requirements >>>>> like >>>>> "Java 5 compatible" or "no multi-threading"). >>>>> As I indicated, the basic "task" could be defined in "Commons >>>>> Math" and >>>>> "Commons Math MT" would provide the parallelization "glue" (e.g. >>>>> to divide >>>>> the search space of the GA). >>>> >>>> I think it is best at this point to cut a branch and actually >>>> start >>>> working on specific algorithms. Having a set of candidate >>>> algorithms for parallelization will help us decide what we >>>> actually >>>> need and how it might work. I would personally favor the 4.x >>>> approach, with thread-spawning behavior configurable. >>> >>> It seems fair to wait until parallel algorithms are actually >>> implemented. >>> >>> However it is not clear what you mean with "the 4.x approach": if >>> it is >>> actually allowing Java 7, that would mean that, starting from 4.0, >>> we'll >>> indeed drop support of earlier JVMs! >>> Why would this be preferred to having 2 projects? Of course, if >>> everyone >>> agrees to that move to Java 7, that's fine. :-) >> >> What I meant was that instead of creating a new component, we would >> just create a new release line. Like what tomcat does for servlet >> spec versions. I guess this does mean that we end up having to >> stabilize the 3.x APIs because no additional "major" release would >> be allowed in that line. That would be a *good thing* IMO as long >> as we can do it cleanly. If not, maybe we end up having to use 5.x >> for the JDK 1.7+ version, using 4.0 to get to a stable API for the >> current trunk code. > > There's a still the human resource problem: we don't have it to > maintain > a single branch; having two will only make it worse.
Yes, but the "new project" approach has the same problem. > >>> >>> On the other hand, if we keep Java 5, at least until we get use >>> cases or >>> contributions that would benefit from features in JDKs newer than >>> 1.5, >>> there is no need to create a branch; we can just go on with adding >>> multi-thread codes to the trunk (to become part[1] of the upcoming >>> 3.x >>> releases). >> >> That is why I wanted to get a feel for what the JDK 1.7 stuff really >> buys you. Has anyone seen benchmarks showing better performance >> using 1.7 than can be obtained just using 1.5 concurrency >> primitives? > > Again, there are separate issues: > 1. Coding in Java 7 > 2. Running with the JVM shipped with JDK 1.7 > > The newer JVMs are faster, independently of whether new features > of the > language are used. > But it could well be that some of the new features allow even better > performance (as is foreseen for Java 8). Agreed. I am interested in understanding better both how much easier it actually is to code and whether the 1.7 framework materially improves scheduling / allocation over what you could do just using 1.5 primitives. > >> Has anyone used 1.7 to parallelize numerical algorithms >> and found it really easier / more performant? > > Where are those people who could answer? This is a public list :) > That is one of the points I raised. If we maintain source > compatibility > with a language version that is 9 years old, not many contributors > are > going to be interested. Thus reducing the chance to get answers... > >> Any opinions / >> responses to Konstantin's comment on where parallelization should be >> implemented - i.e. in the library vs somewhere up the stack? > > What was the _question_? ... The question he implicitly raised was whether or not it makes sense for a low-level library to parallelize tasks / run across cores. This is a legitimate question. It may be better actually to set things up so that higher-level frameworks or applications can arrange parallel execution rather than embedding it in the low-level library itself. This is also what I was referring to when I said that in some contexts, thread-spawning / cpu hogging may not be desirable. > >> Any >> ideas how to set things up so that [math] code can play nicely with >> concurrency frameworks? > > That's a strange question in the context of a project that tries hard > not to have any dependency. I did not mean necessarily to bring in dependencies; but rather to make it easy for computational tasks executed by [math] code to be managed by external concurrency frameworks, e.g. Hadoop. Phil > If the requirement is to only depend on the standard JDK: the > framework > is in > java.util.concurrent > and all we need to do is to define "tasks" that can be "submitted to > an executor: > > http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/AbstractExecutorService.html#submit(java.util.concurrent.Callable) > > > Regards, > Gilles > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org