On Fri, 17 Apr 2015 08:35:42 -0700, Phil Steitz wrote:
On 4/17/15 3:14 AM, Gilles wrote:
Hello.

On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote:
Consider me poked!

So, the Java answer to "how do I run things in multiple threads"
is to
use an Executor (java.util). This doesn't necessarily mean that you
*have* to use a separate thread (the implementation could execute
inline). However, in order to accommodate the separate thread case, you would need to code to a Future-like API. Now, I'm not saying to use Executors directly, but I'd provide some abstraction layer above
them or in lieu of them, something like:

public interface ExecutorThingy {
  Future<T> execute(Function<T> fn);
}

One could imagine implementing different ExecutorThingy
implementations which allow you to parallelize things in different
ways (simple threads, JMS, Akka, etc, etc.)

I did not understand what is being suggested: parallelization of a
single algorithm or concurrent calls to multiple instances of an
algorithm?

Really both.  It's probably best to look at some concrete examples.

Certainly...

The two I mentioned in my apachecon talk are:

1.  Threads managed by some external process / application gathering
statistics to be aggregated.

2.  Allowing multiple threads to concurrently execute GA
transformations within the GeneticAlgorithm "evolve" method.

I could not view the presentation from the link previously mentioned
(it did not work with my browser...).
Can I download the PDF file from somewhere?

It would be instructive to think about how to handle both of these
use cases using something like what James is suggesting.  What is
nice about his idea is that it could give us a way to let users /
systems decide whether they want to have [math] algorithms spawn
threads to execute concurrently or to allow an external execution
framework to handle task distribution across threads.

Some (all?) cases of "external" parallelism are trivial for the CM
developers: the user must chop his data, pass the chunks as arguments
to the CM methods, then collect and reassemble the results, all by
himself.
IIUC the scenario, this cannot be deemed a "feature".

Since 2. above is a good example of "internal" parallelism and it
also has data sharing / transfer challenges, maybe its best to start
with that one.

That's the scenario where usage is simple and performance can match
the user's machine capability when running CM algorithms that are
inherently parallel.

There is an example in CM: see
  testTravellerSalesmanSquareTourParallelSolver()
in
  org.apache.commons.math4.ml.neuralnet.sofm.KohonenTrainingTaskTest

I have just started thinking about this and would
love to get better ideas than my own hacking about how to do it

a) Using Spark with RDD's to maintain population state data
b) Hadoop with HDFS (or something else?)

I have zero experience with this but I'm interested to know more. :-)

Regards,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to