On 1/16/15 2:09 AM, Thomas Neidhart wrote: > On 01/16/2015 01:30 AM, Gilles wrote: >> On Thu, 15 Jan 2015 15:41:11 -0700, Phil Steitz wrote: >>> On 1/15/15 2:24 PM, Thomas Neidhart wrote: >>>> On 01/08/2015 12:34 PM, Gilles wrote: >>>>> Hi. >>>>> >>>>> Raising this issue once again. >>>>> Are we going to upgrade the requirement for the next major release? >>>>> >>>> [ ] Java 5 >>>> [x] Java 6 >>>> [x] Java 7 >>>> [ ] Java 8 >>>> [ ] Java 9 >>>> >>>> A while ago I thought that it would be cool to switch to Java 7/8 for >>>> some of the nice new features (mainly fork/join, lambda expressions and >>>> diamond operator, the rest is more or less unimportant for math imho). >>>> >>>> But after some thoughts I think they are not really needed for the >>>> following reasons: >>>> >>>> * the main focus of math is on developing high-quality, well tested and >>>> documented algorithms, the existing language features are more than >>>> enough for this >> Sure. >> Not so long ago, some people were claiming that nothing beats >> programming in "assembly" language. >> >>> +1 >>>> * coming up with multi-threaded algorithms might be appealing but it is >>>> also hard work and I wonder if it really makes sense in the times of >>>> projects like mahout / hadoop / ... which aim for even better >>>> scalability >>> +1 >> Hard work / easy work. Yes and no. It depends on the motivation >> of the contributor. Or we have to (re)define clearly the scope of >> CM, and start some serious clean-up. >> It's not all black or white; I'm quite convinced that it's better >> to handle multi-threading externally when the core computation is >> sequential. But CM already contains algorithms that are inherently >> parallel (a.o. genetic algorithms) and improvement in those areas >> would undoubtedly benefit from (internal) parallel processing. > I think the better approach is to support external parallelization > rather than trying to do it yourself. From a user POV, I would be scared > to use a library that does some kind of parallelization internally which > I can not control.
+1 > > Some recent examples show how it can be done better: there were some > requests to make some of the statistics related classes map/reducable so > that they can be used in Java 8 parallel streams. +1 - mostly done. > > @genetic algorithms: there are far more better libraries out there for > this area and the support we have in math is really very simplistic. You > can basically do just a few demo examples with it and I am more in favor > to deprecate the package. Agreed there is better stuff out there, but I like the structure of what we have (weak as the capabilities may be). I have often thought about playing with replacing the GeneticAlgorithm and Population implementations with M/R-capable things. I bet this could be done without changing our API at all - just using the lower-level constructs in a distributed execution environment. I have not actually done this so am not sure it would work; but I don't see why not. This still leaves gaps in encoding, etc; but those could be filled over time. I would be -0 on deprecating the package, partly because I am a user of it :) Phil > >>> My HO is we should focus on getting the best single-threaded >>> implementations we can and, where possible, setting things up to be >>> executed in parallel by other engines. Spawning and managing >>> threads internal to [math] actually *reduces* the range of >>> applicability of our stuff. >> Examples? > because not everybody wants a library to do parallel stuff internally. > Just imagine math being used in a web-application deployed together with > many other applications. It is clearly not an option that one > application might take over most/all of the available processors. > >>> Much better to let Hadoop / Mahout et >>> al parallelize using fast and accurate piece parts that we can >>> provide. >> Do they really do that? >> [Or do they implement their own algorithms knowing that they must >> be thread-safe (which is something we don't focus a lot on).] > I guess they have mainly their own algorithms, but there are examples of > our stuff being used (using the map/reduce paradigm). > >>> If there are parallel algorithms that we are really dying >>> to implement directly, I would rather see that done in a way that >>> encapsulates and enables externalization of the thread management. >>>> * staying at Java 6/7 does not block users to use math in a Java 8 >>>> environment if wanted >>> +1 - the examples I have seen thus far are all things that could be >>> done fairly easily with client code. I know we don't all agree with >>> this, but I think the biggest service we can provide to our user >>> base is good, tested, supported implementations of standard >>> algorithms. I wish we could find a way to focus more on that and >>> less on fiddling with the API or language features. > +1, I have the impressions that they more we try to *optimize* an API we > end up with an inferior solution (with a few exceptions). > > There is too much discussion about API design. We should have our best > practices and use them to implement rock-solid algorithms, which is > already difficult enough. In the end it does not matter so much if you > have a fluent API or whatever, as long as it calculates the correct > result, and is easy to use, imho. > >> The problem is that those discussions constantly mix considerations >> about contents, with political moves that do not necessarily match. >> For example, a statement about contents would be: CM only provides >> implementations of sequential mathematical algorithms. >> But recent political moves, like changing the version control system >> or advertizing "free for all" commit rights, aim at increasing the >> contributor base. > I think these considerations are orthogonal: > > * what you want to do? aka scope of the projects > * how you want to do it? > * what infrastructure do you provide to your users/collaborators > >> What about those people interested in API fixing and new language >> features? You'll make them want to contribute to another project. >> Now that Java is, at last, beginning to catch up with other >> languages incomparably more widely used in the scientific community, >> Commons Math is discussing how far behind it is going to lag! > Afaik the scientific community uses mainly python with its abundance of > great tools. I think Java is better suited in an engineering context. > > Thomas > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org