Say you have got all of your folder paths into a val folders: Seq[String]
val add = sc.parallelize(folders, folders.size).mapPartitions { iter =>
val folder = iter.next
val status: Int = <call your executable with the folder path string>
Seq(status).toIterator
}
> On Jun 30, 2016, at 16:42, Balachandar R.A. <[email protected]> wrote:
>
> Hello,
>
> I have some 100 folders. Each folder contains 5 files. I have an executable
> that process one folder. The executable is a black box and hence it cannot be
> modified.I would like to process 100 folders in parallel using Apache spark
> so that I should be able to span a map task per folder. Can anyone give me an
> idea? I have came across similar questions but with Hadoop and answer was to
> use combineFileInputFormat and pathFilter. However, as I said, I want to use
> Apache spark. Any idea?
>
> Regards
> Bala
>