Parallelizing job execution

Ognen Duzlevski Fri, 21 Mar 2014 06:44:44 -0700

Hello,

I have a task that runs on a week's worth of data (let's say) andproduces a Set of tuples such as Set[(String,Long)] (essentially outputof countByValue.toMap)

I want to produce 4 sets, one each for a different week and run anintersection of the 4 sets.

I have the sequential approach going but obviously, the 4 weeks areindependent of each other in how they produce the sets (they all work ontheir own data) so the same job that produces a Set for one week canjust be run as 4 jobs in parallel all with different week start dates.

How is this done in Spark? Is it the runJob() method on SparkContext?Any example code anywhere?


Thanks!
Ognen

Parallelizing job execution

Reply via email to