Hello,

I have a task that runs on a week's worth of data (let's say) and produces a Set of tuples such as Set[(String,Long)] (essentially output of countByValue.toMap)

I want to produce 4 sets, one each for a different week and run an intersection of the 4 sets.

I have the sequential approach going but obviously, the 4 weeks are independent of each other in how they produce the sets (they all work on their own data) so the same job that produces a Set for one week can just be run as 4 jobs in parallel all with different week start dates.

How is this done in Spark? Is it the runJob() method on SparkContext? Any example code anywhere?

Thanks!
Ognen

Reply via email to