On 4/17/15 9:01 AM, Gilles wrote: > On Fri, 17 Apr 2015 08:35:42 -0700, Phil Steitz wrote: >> On 4/17/15 3:14 AM, Gilles wrote: >>> Hello. >>> >>> On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote: >>>> Consider me poked! >>>> >>>> So, the Java answer to "how do I run things in multiple threads" >>>> is to >>>> use an Executor (java.util). This doesn't necessarily mean >>>> that you >>>> *have* to use a separate thread (the implementation could execute >>>> inline). However, in order to accommodate the separate thread >>>> case, >>>> you would need to code to a Future-like API. Now, I'm not >>>> saying to >>>> use Executors directly, but I'd provide some abstraction layer >>>> above >>>> them or in lieu of them, something like: >>>> >>>> public interface ExecutorThingy { >>>> Future<T> execute(Function<T> fn); >>>> } >>>> >>>> One could imagine implementing different ExecutorThingy >>>> implementations which allow you to parallelize things in different >>>> ways (simple threads, JMS, Akka, etc, etc.) >>> >>> I did not understand what is being suggested: parallelization of a >>> single algorithm or concurrent calls to multiple instances of an >>> algorithm? >> >> Really both. It's probably best to look at some concrete examples. > > Certainly... > >> The two I mentioned in my apachecon talk are: >> >> 1. Threads managed by some external process / application gathering >> statistics to be aggregated. >> >> 2. Allowing multiple threads to concurrently execute GA >> transformations within the GeneticAlgorithm "evolve" method. > > I could not view the presentation from the link previously mentioned > (it did not work with my browser...). > Can I download the PDF file from somewhere?
Sorry. Try this (unshortened) link http://www.slideshare.net/psteitz/commons-mathapacheconna2015 > >> It would be instructive to think about how to handle both of these >> use cases using something like what James is suggesting. What is >> nice about his idea is that it could give us a way to let users / >> systems decide whether they want to have [math] algorithms spawn >> threads to execute concurrently or to allow an external execution >> framework to handle task distribution across threads. > > Some (all?) cases of "external" parallelism are trivial for the CM > developers: the user must chop his data, pass the chunks as arguments > to the CM methods, then collect and reassemble the results, all by > himself. > IIUC the scenario, this cannot be deemed a "feature". The idea is to make it easier for users to do this "chopping" and "reassembling" and / or to let these operations be managed by external frameworks. The AggregatedStatistics class is a simple example of making it easier for users to do directly. > >> Since 2. above is a good example of "internal" parallelism and it >> also has data sharing / transfer challenges, maybe its best to start >> with that one. > > That's the scenario where usage is simple and performance can match > the user's machine capability when running CM algorithms that are > inherently parallel. > > There is an example in CM: see > testTravellerSalesmanSquareTourParallelSolver() > in > org.apache.commons.math4.ml.neuralnet.sofm.KohonenTrainingTaskTest The challenge is how to make this kind of thing possible "simply" without just pegging the local machine's cores in an unmanaged way. I think James has the kernel of an idea that would allow us to have it both ways - "greedy / local" or "managed / remotable." This is all hand-waving at this point; but the idea that we could find a way to make our parallelizable algorithms executable via locally spawned threads or external task managers is appealing. > >> I have just started thinking about this and would >> love to get better ideas than my own hacking about how to do it >> >> a) Using Spark with RDD's to maintain population state data >> b) Hadoop with HDFS (or something else?) > > I have zero experience with this but I'm interested to know more. :-) I am also just learning Spark. It will likely take me a while to get something meaningful; but I will start playing with this. Other ideas / patches welcome! Phil > > Regards, > Gilles > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org