Am 22.07.2012 21:01, schrieb Ted Dunning:
I don't believe that there are any commons math algorithms that would
benefit from execution in a Hadoop map-reduce style.  The issue is that
iterative algorithms are essentially incompatible with the very large
startup costs of map-reduce programs under Hadoop.

Some algorithms can be recast to make use of an all-reduce operator which
can be implemented in a map-only job.  EM algorithms often have this
structure.

Otherwise, massive algorithmic change is usually necessary.  For instance,
partial SVD can be done using a fixed and small number of map-reduce
operations by using stochastic projection.

Threaded execution, on the other hand, can be very, very helpful for a
number of math algorithms and thread management inside commons math is a
very reasonable option in those cases.  This would provide a performance
boost with very little complexity for the user of math.  Managing these
threads is really pretty simple as well.


How about the Fork-Join framework of Java 7 as an alternative?

Well, you probably don't want to switch to Java 7 now, but maybe in a later version? And I think, there are back-ports for earlier Java versions.

Oliver


On Sun, Jul 22, 2012 at 9:27 AM, Phil Steitz <phil.ste...@gmail.com> wrote:

On 7/21/12 6:17 AM, Gilles Sadowski wrote:
Hi.

My previous post (with subject "Synchronisation") made me think (again)
that
it might be useful to start considering how to take advantage of
multi-threading in Commons Math.
Indeed, it seems that some parts of the library might end up not being
used
anymore because their performance simply cannot match competing
implementations that do benefit form parallelization. [The recent example
that comes to mind is the FFT.]

This is an interesting question.  I am also -1 on adding
dependencies, but it would be a good idea to look at how others have
solved the problem of how to support parallel execution by multiple
threads without managing threads directly.  Lots of [math]
algorithms could be parallelized.  The question is how to
effectively coordinate the work without owning or creating the
workers.  I would be -0 to any suggestion that involved [math]
itself spawning threads, since that 0) creates management headeaches
1) may violate some container contracts and 2) forces execution
threads to be in the same process.  I think it is worth thinking
about how we might support parallel execution by externally managed
workers.  An obvious thing to look at is how to break our
parallelizable algorithms into pieces that could be executed in
Hadoop Map/Reduce jobs.  Step 0) is the breaking up part.  Then step
1) might be either some examples added to the user guide or custom
Pig functions (or examples of how to code them).

Phil


Best regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to