Re: Parallelize independent tasks

2014-12-02 Thread Victor Tso-Guillen
dirs.par.foreach { case (src,dest) => sc.textFile(src).process.saveAsFile(dest) } Is that sufficient for you? On Tuesday, December 2, 2014, Anselme Vignon wrote: > Hi folks, > > > We have written a spark job that scans multiple hdfs directories and > perform transformations on them. > > For now

Parallelize independent tasks

2014-12-02 Thread Anselme Vignon
Hi folks, We have written a spark job that scans multiple hdfs directories and perform transformations on them. For now, this is done with a simple for loop that starts one task at each iteration. This looks like: dirs.foreach { case (src,dest) => sc.textFile(src).process.saveAsFile(dest) } H