Re: [math] threading redux

James Carman Sat, 18 Apr 2015 19:28:43 -0700

I think I got sidetracked when typing that email.  I was trying to say that
we need an abstraction layer above raw threads in order to allow for
different types of parallelism.   The Future abstraction is there in order
to support remote execution where side effects aren't good enough.


As for a concrete example, you can try Phil's idea of the genetic algorithm
stuff I suppose.


On Saturday, April 18, 2015, Gilles <[email protected]> wrote:

> On Fri, 17 Apr 2015 16:53:56 -0500, James Carman wrote:
>
>> Do you have any pointers to code for this ForkJoin mechanism?  I'm
>> curious to see it.
>>
>> The key thing you will need in order to support parallelization in a
>> generic way
>>
>
> What do you mean by "generic way"?
>
> I'm afraid that we may be trying to compare apples and oranges;
> each of us probably has in mind a "prototype" algorithm and an idea
> of how to implement it to make it run in parallel.
>
> I think that it would focus the discussion if we could
> 1. tell what the "prototype" is,
> 2. show a sort of pseudo-code of the difference between a sequential
>    and a parallel run of this "prototype" (i.e. what is the data, how
>    the (sub)tasks operate on them).
>
> Regards,
> Gilles
>
>  is to not tie it directly to threads, but use some
>> abstraction layer above threads, since that may not be the "worker"
>> method you're using at the time.
>>
>> On Fri, Apr 17, 2015 at 2:57 PM, Thomas Neidhart
>> <[email protected]> wrote:
>>
>>> On 04/17/2015 05:35 PM, Phil Steitz wrote:
>>>
>>>> On 4/17/15 3:14 AM, Gilles wrote:
>>>>
>>>>> Hello.
>>>>>
>>>>> On Thu, 16 Apr 2015 17:06:21 -0500, James Carman wrote:
>>>>>
>>>>>> Consider me poked!
>>>>>>
>>>>>> So, the Java answer to "how do I run things in multiple threads"
>>>>>> is to
>>>>>> use an Executor (java.util).  This doesn't necessarily mean that you
>>>>>> *have* to use a separate thread (the implementation could execute
>>>>>> inline).  However, in order to accommodate the separate thread case,
>>>>>> you would need to code to a Future-like API.  Now, I'm not saying to
>>>>>> use Executors directly, but I'd provide some abstraction layer above
>>>>>> them or in lieu of them, something like:
>>>>>>
>>>>>> public interface ExecutorThingy {
>>>>>>   Future<T> execute(Function<T> fn);
>>>>>> }
>>>>>>
>>>>>> One could imagine implementing different ExecutorThingy
>>>>>> implementations which allow you to parallelize things in different
>>>>>> ways (simple threads, JMS, Akka, etc, etc.)
>>>>>>
>>>>>
>>>>> I did not understand what is being suggested: parallelization of a
>>>>> single algorithm or concurrent calls to multiple instances of an
>>>>> algorithm?
>>>>>
>>>>
>>>> Really both.  It's probably best to look at some concrete examples.
>>>> The two I mentioned in my apachecon talk are:
>>>>
>>>> 1.  Threads managed by some external process / application gathering
>>>> statistics to be aggregated.
>>>>
>>>> 2.  Allowing multiple threads to concurrently execute GA
>>>> transformations within the GeneticAlgorithm "evolve" method.
>>>>
>>>> It would be instructive to think about how to handle both of these
>>>> use cases using something like what James is suggesting.  What is
>>>> nice about his idea is that it could give us a way to let users /
>>>> systems decide whether they want to have [math] algorithms spawn
>>>> threads to execute concurrently or to allow an external execution
>>>> framework to handle task distribution across threads.
>>>>
>>>
>>> I since a more viable option is to take advantage of the ForkJoin
>>> mechanism that we can use now in math 4.
>>>
>>> For example, the GeneticAlgorithm could be quite easily changed to use a
>>> ForkJoinTask to perform each evolution, I will try to come up with an
>>> example soon as I plan to work on the genetics package anyway.
>>>
>>> The idea outlined above sounds nice but it is very unclear how an
>>> algorithm or function would perform its parallelization in such a way,
>>> and whether it would still be efficient.
>>>
>>> Thomas
>>>
>>>  Since 2. above is a good example of "internal" parallelism and it
>>>> also has data sharing / transfer challenges, maybe its best to start
>>>> with that one.  I have just started thinking about this and would
>>>> love to get better ideas than my own hacking about how to do it
>>>>
>>>> a) Using Spark with RDD's to maintain population state data
>>>> b) Hadoop with HDFS (or something else?)
>>>>
>>>> Phil
>>>>
>>>>>
>>>>>
>>>>> Gilles
>>>>>
>>>>>  [...]
>>>>>>>
>>>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [math] threading redux

Reply via email to