Hi folks,
We have written a spark job that scans multiple hdfs directories and
perform transformations on them.
For now, this is done with a simple for loop that starts one task at
each iteration. This looks like:
dirs.foreach { case (src,dest) => sc.textFile(src).process.saveAsFile(dest) }
H
Hi,
Could your problem come from the fact that you run your tests in parallel ?
If you are spark in local mode, you cannot have concurrent spark instances
running. this means that your tests instantiating sparkContext cannot be
run in parallel. The easiest fix is to tell sbt to not run parallel t