Le 08/02/2013 03:21, Konstantin Berlin a écrit : > Sorry, but not of this is making sense to me. We had a long discussion > about how the library doesn't test for large scale problem > performance. A lot of algorithms probably do not scale well as the > result. There was talk of dropping sparse support in linear algebra. > So instead of fixing that, you jump to parallelization, which is > needed only for large scale problems, which this library does not > handle well even in single thread right now. > > The most significant impact you can have is fixing the linear algebra > component.
I agree with this. Also in order to avoid spreading our attention too much on keeping several branches in sync, I would suggest to not create a new component but directly decide we will not support Java 5 anymore as of Apache Commons Math 4.0, so people can progressively use the new features of the language and experiment directly on the trunk. best regards, Luc > > On Feb 7, 2013, at 5:06 PM, Gilles <gil...@harfang.homelinux.org> wrote: > >> On Thu, 07 Feb 2013 08:32:46 -0800, Phil Steitz wrote: >>> On 2/7/13 8:04 AM, Gilles wrote: >>>> On Thu, 07 Feb 2013 07:01:42 -0800, Phil Steitz wrote: >>>>> On 2/7/13 4:58 AM, Gilles wrote: >>>>>> On Wed, 06 Feb 2013 09:46:55 -0800, Phil Steitz wrote: >>>>>>> On 2/6/13 9:03 AM, Gilles wrote: >>>>>>>> On Wed, 06 Feb 2013 07:19:47 -0800, Phil Steitz wrote: >>>>>>>>> On 2/5/13 6:08 AM, Gilles wrote: >>>>>>>>>> Hi. >>>>>>>>>> >>>>>>>>>> In the thread about "static import", Stephen noted that >>>>>>>>>> decisions >>>>>>>>>> on a >>>>>>>>>> component's evolution are dependent on whether the future of >>>>>>>>>> the >>>>>>>>>> Java >>>>>>>>>> language is taken into account, or not. >>>>>>>>>> A question on the same theme also arose after the >>>>>>>>>> presentation of >>>>>>>>>> Commons >>>>>>>>>> Math in FOSDEM 2013. >>>>>>>>>> >>>>>>>>>> If we assume that efficiency is among the important >>>>>>>>>> qualities for >>>>>>>>>> Commons >>>>>>>>>> Math, the future is to allow usage of the tools provided by the >>>>>>>>>> standard >>>>>>>>>> Java library in order to ease the development of multi-threaded >>>>>>>>>> algorithms. >>>>>>>>>> >>>>>>>>>> Maintaining Java 1.5 source compatibility for the reason >>>>>>>>>> that we >>>>>>>>>> may need >>>>>>>>>> to support legacy applications will turn out to be >>>>>>>>>> self-defeating: >>>>>>>>>> 1. New users will not consider Commons Math's features that are >>>>>>>>>> notably >>>>>>>>>> apt to parallel processing. >>>>>>>>>> 2. Current users might at some point simply switch to another >>>>>>>>>> library if >>>>>>>>>> it proves more efficient (because it actually uses >>>>>>>>>> multi-threading). >>>>>>>>>> 3. New Java developers will be turned away because they will >>>>>>>>>> want >>>>>>>>>> to use >>>>>>>>>> the more convenient features of the language in order to >>>>>>>>>> provide >>>>>>>>>> potential contributions. >>>>>>>>>> >>>>>>>>>> If maintaining 1.5 source compatibility is kept as a >>>>>>>>>> requirement, the >>>>>>>>>> consequence is that Commons Math will _become_ a legacy >>>>>>>>>> library. >>>>>>>>>> In that perspective, implementing/improving algorithms for >>>>>>>>>> which a >>>>>>>>>> parallel version is known to be more efficient is plainly a >>>>>>>>>> waste of >>>>>>>>>> development and maintenance time. >>>>>>>>>> >>>>>>>>>> In order to mitigate the risks (both of upgrading and of not >>>>>>>>>> upgrading >>>>>>>>>> the source compatibility requirement), I would propose to >>>>>>>>>> create a >>>>>>>>>> new >>>>>>>>>> project (say, "Commons Math MT") where we could implement new >>>>>>>>>> features[1] >>>>>>>>>> without being encumbered with the 1.5 requirement.[2] >>>>>>>>>> The "Commons Math MT" would depend on "Commons Math" where we >>>>>>>>>> would >>>>>>>>>> continue developing single-thread (and thread-safe) "tasks", >>>>>>>>>> i.e. >>>>>>>>>> independent units of processing that could be used in >>>>>>>>>> algorithms >>>>>>>>>> located in "Commons Math MT". >>>>>>>>>> >>>>>>>>>> In summary: >>>>>>>>>> - Commons Math (as usual): >>>>>>>>>> * single-thread (sequential) algorithms, >>>>>>>>>> * (pure) Java 5, >>>>>>>>>> * no dependencies. >>>>>>>>>> - Commons Math MT: >>>>>>>>>> * multi-thread (parallel) algorithms, >>>>>>>>>> * Java 7 and beyond, >>>>>>>>>> * JNI allowed, >>>>>>>>>> * dependencies allowed (jCuda). >>>>>>>>>> >>>>>>>>>> What do you think? >>>>>>>>> >>>>>>>>> There are several other possibilities to consider: >>>>>>>>> >>>>>>>>> 0) Implement multithreading using JDK 1.5 primitives >>>>>>>>> 1) Set things up within [math] to support parallel execution in >>>>>>>>> JDK >>>>>>>>> 1.7, Hadoop or other frameworks >>>>>>>>> 2) Instead of a new project, start a 4.x branch targeting JDK >>>>>>>>> 1.7 >>>>>>>>> >>>>>>>>> I think we should maintain a version that has no dependencies >>>>>>>>> and no >>>>>>>>> JNI in any case. >>>>>>>>> >>>>>>>>> Starting a branch and getting concrete about how to parallelize >>>>>>>>> some >>>>>>>>> algorithms would be a good way to start. One thing I have not >>>>>>>>> really investigated and would be interested in details on is >>>>>>>>> what >>>>>>>>> you actually get in efficiency gain (or loss?) using fork / >>>>>>>>> join vs >>>>>>>>> just using 1.5+ concurrency for the kinds of problems we >>>>>>>>> would end >>>>>>>>> up using this stuff for. >>>>>>>>> >>>>>>>>> Thinking about specific parallelization problem instances would >>>>>>>>> also >>>>>>>>> help decide whether 1) makes sense (i.e., whether it makes >>>>>>>>> sense as >>>>>>>>> you mention above to maintain a single-threaded library that >>>>>>>>> provides task execution for a multithreaded version or >>>>>>>>> multithreaded >>>>>>>>> frameworks). >>>>>>>>> >>>>>>>>> One more thing to consider is that for at least some users of >>>>>>>>> [math], having the library internally spawn threads and/or peg >>>>>>>>> multiple processors may not be desirable. It is a little >>>>>>>>> misleading >>>>>>>>> to say that multithreading is the way to get "efficiency." >>>>>>>>> It is >>>>>>>>> really the way to *use* more compute resources and unless there >>>>>>>>> are >>>>>>>>> real algorithmic improvements, the overall efficiency may >>>>>>>>> actually >>>>>>>>> be less, due to task coordination overhead. What you get is >>>>>>>>> faster >>>>>>>>> execution due to more greedy utilization of available cores. >>>>>>>>> Actual >>>>>>>>> efficiency (how much overall compute resource it takes to >>>>>>>>> complete a >>>>>>>>> job) partly depends on how efficiently the coordination >>>>>>>>> itself is >>>>>>>>> done (which JDK 1.7 claims to do very well - I have just not >>>>>>>>> seen >>>>>>>>> substantiation or any benchmarks demonstrating this) and how the >>>>>>>>> parallelization effects overall compute requirements. In any >>>>>>>>> case, >>>>>>>>> for environments where library thread-spawning is not >>>>>>>>> desirable, I >>>>>>>>> think we should maintain a single-threaded version. >>>>>>>> >>>>>>>> Unless I missed the point, those reasons are exactly why I >>>>>>>> propose to >>>>>>>> have 2 projects/components. One, "Commons-Math", does not fiddle >>>>>>>> with >>>>>>>> resources, while the other would provide a "parallelizationLevel" >>>>>>>> setting for the algorithms written to possibly take advantage of >>>>>>>> the >>>>>>>> Java 5+ "task framework". >>>>>>> >>>>>>> OK, what about the 4.x option? >>>>>>>> >>>>>>>> Yes, we could still be good by using only Java 5's concurrency >>>>>>>> features >>>>>>>> but the issue I raise is not only about concurrency but about >>>>>>>> evolution/progress/maintenance, all things that require raising >>>>>>>> interest >>>>>>>> from new contributors (unless it's fine that Commons Math be >>>>>>>> tagged as a >>>>>>>> "library of the past"...). >>>>>>> >>>>>>> +1 for experimenting with parallelization. I would just like to >>>>>>> understand if the JDK 7 stuff really adds much - in particular, >>>>>>> does >>>>>>> it handle coordination / cpu allocation better than you could >>>>>>> easily >>>>>>> do it with 1.5. More supported JDKs == more potential users, so I >>>>>>> like to see a real reason to bump the JDK level. >>>>>>>> >>>>>>>> But using concurrency features in "Commons Math" would also >>>>>>>> contradict >>>>>>>> your own point ("we should maintain a single-threaded >>>>>>>> version"): I >>>>>>>> agree, >>>>>>>> and that's why I proposed this other project... >>>>>>>> >>>>>>>> As for efficiency (or faster execution, if you want), I don't >>>>>>>> see the >>>>>>>> point in doubting that tasks like global search (e.g. in a >>>>>>>> genetic >>>>>>>> algorithm) will complete in less time when run in parallel... >>>>>>>> >>>>>>>> As I summarized previously, having a "Commons Math MT" would >>>>>>>> bring no >>>>>>>> inconvenience, contrary to either your points 0, 1, or 2. [No >>>>>>>> inconvenience to me, that is, but to people with requirements >>>>>>>> like >>>>>>>> "Java 5 compatible" or "no multi-threading"). >>>>>>>> As I indicated, the basic "task" could be defined in "Commons >>>>>>>> Math" and >>>>>>>> "Commons Math MT" would provide the parallelization "glue" (e.g. >>>>>>>> to divide >>>>>>>> the search space of the GA). >>>>>>> >>>>>>> I think it is best at this point to cut a branch and actually >>>>>>> start >>>>>>> working on specific algorithms. Having a set of candidate >>>>>>> algorithms for parallelization will help us decide what we >>>>>>> actually >>>>>>> need and how it might work. I would personally favor the 4.x >>>>>>> approach, with thread-spawning behavior configurable. >>>>>> >>>>>> It seems fair to wait until parallel algorithms are actually >>>>>> implemented. >>>>>> >>>>>> However it is not clear what you mean with "the 4.x approach": if >>>>>> it is >>>>>> actually allowing Java 7, that would mean that, starting from 4.0, >>>>>> we'll >>>>>> indeed drop support of earlier JVMs! >>>>>> Why would this be preferred to having 2 projects? Of course, if >>>>>> everyone >>>>>> agrees to that move to Java 7, that's fine. :-) >>>>> >>>>> What I meant was that instead of creating a new component, we would >>>>> just create a new release line. Like what tomcat does for servlet >>>>> spec versions. I guess this does mean that we end up having to >>>>> stabilize the 3.x APIs because no additional "major" release would >>>>> be allowed in that line. That would be a *good thing* IMO as long >>>>> as we can do it cleanly. If not, maybe we end up having to use 5.x >>>>> for the JDK 1.7+ version, using 4.0 to get to a stable API for the >>>>> current trunk code. >>>> >>>> There's a still the human resource problem: we don't have it to >>>> maintain >>>> a single branch; having two will only make it worse. >>> >>> Yes, but the "new project" approach has the same problem. >> >> Yes. >> However, I meant it as a way to separate concerns, as shown >> by diverging opinions, even in the few people who take part >> in this discussion or in previous ones about the same subject. >> >> A sibling (not separate!) project could allow interested >> people to experiment while not adding yet another "distraction" >> to the main project, where people more focused on the >> mathematical (for lack of a better word) side can continue >> their own improvements. >> A healthy interaction could even come out of having a "public" >> use-case in the form of a project that needs certain facilities >> (algorithms as tasks) in order to provide multi-thread >> utilities to users (who might prefer not to have to implement >> them themselves at a higher level). >> >>>>>> On the other hand, if we keep Java 5, at least until we get use >>>>>> cases or >>>>>> contributions that would benefit from features in JDKs newer than >>>>>> 1.5, >>>>>> there is no need to create a branch; we can just go on with adding >>>>>> multi-thread codes to the trunk (to become part[1] of the upcoming >>>>>> 3.x >>>>>> releases). >>>>> >>>>> That is why I wanted to get a feel for what the JDK 1.7 stuff really >>>>> buys you. Has anyone seen benchmarks showing better performance >>>>> using 1.7 than can be obtained just using 1.5 concurrency >>>>> primitives? >>>> >>>> Again, there are separate issues: >>>> 1. Coding in Java 7 >>>> 2. Running with the JVM shipped with JDK 1.7 >>>> >>>> The newer JVMs are faster, independently of whether new features >>>> of the >>>> language are used. >>>> But it could well be that some of the new features allow even better >>>> performance (as is foreseen for Java 8). >>> >>> Agreed. I am interested in understanding better both how much >>> easier it actually is to code and whether the 1.7 framework >>> materially improves scheduling / allocation over what you could do >>> just using 1.5 primitives. >> >> I cannot provide proof, but nor is anyone on this list >> eager to prove the contrary; hence the proposal to set >> up a "playground". >> >>>>> Has anyone used 1.7 to parallelize numerical algorithms >>>>> and found it really easier / more performant? >>>> >>>> Where are those people who could answer? >>> >>> This is a public list :) >>>> That is one of the points I raised. If we maintain source >>>> compatibility >>>> with a language version that is 9 years old, not many contributors >>>> are >>>> going to be interested. Thus reducing the chance to get answers... >>>> >>>>> Any opinions / >>>>> responses to Konstantin's comment on where parallelization should be >>>>> implemented - i.e. in the library vs somewhere up the stack? >>>> >>>> What was the _question_? ... >>> >>> The question he implicitly raised was whether or not it makes sense >>> for a low-level library to parallelize tasks / run across cores. >> >> In several areas, CM is not a low-level library (GA, multi-start >> optimizers for example). In other areas like FFT, a user can >> legitimately expect top performance without having to handle >> parallelization by himself. >> >>> This is a legitimate question. It may be better actually to set >>> things up so that higher-level frameworks or applications can >>> arrange parallel execution rather than embedding it in the low-level >>> library itself. This is also what I was referring to when I said >>> that in some contexts, thread-spawning / cpu hogging may not be >>> desirable. >> >> For several cases (GA, FFT, multi-start optimizers), I have the >> opposite viewpoint: multi-threading is a implementation detail, >> that could be handled at a _lower_ level. Of course, the user can >> decide whether to enable more than one thread. >> >>>>> Any >>>>> ideas how to set things up so that [math] code can play nicely with >>>>> concurrency frameworks? >>>> >>>> That's a strange question in the context of a project that tries hard >>>> not to have any dependency. >>> >>> I did not mean necessarily to bring in dependencies; but rather to >>> make it easy for computational tasks executed by [math] code to be >>> managed by external concurrency frameworks, e.g. Hadoop. >> >> In the context of Commons Math, we often heard that "no dependency" >> is good. Then, it is also good to not impose _implicit_ dependencies >> (like: "If you use Hadoop, you could have better performance"). In a >> way, the CM development "model" is: "We provide a toolkit of efficient >> procedures, and you, the user, get top performance (on a best effort >> basis of course)." >> If we can provide better performance through multi-threading, why not? >> Nobody will be forced to use it: they will use the "basic" (sequential) >> tasks, or set the "parallelizationLevel" setting to 1. >> >> Gilles >> >>> Phil >>>> If the requirement is to only depend on the standard JDK: the >>>> framework >>>> is in >>>> java.util.concurrent >>>> and all we need to do is to define "tasks" that can be "submitted to >>>> an executor: >>>> >>>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/AbstractExecutorService.html#submit(java.util.concurrent.Callable) >>>> >>>> >>>> Regards, >>>> Gilles >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org