Hello,
The variable argsList is an array defined above the parallel block. This
variawis accessed inside the map function. Launcher.main is not threadsafe.
Is is not possible to specify to spark that every folder needs to be
processed as a separate process in a separate working directory?
Regards
Where is argsList defined? is Launcher.main() thread-safe? Note that if
multiple folders are processed in a node, multiple threads may concurrently run
in the executor, each processing a folder.
> On Jul 14, 2016, at 12:28, Balachandar R.A. wrote:
>
> Hello Ted,
>
>
> Thanks for the respons
>
> Hello Ted,
>
Thanks for the response. Here is the additional information.
> I am using spark 1.6.1 (spark-1.6.1-bin-hadoop2.6)
>
>
>
> Here is the code snippet
>
>
>
>
>
> JavaRDD add = jsc.parallelize(listFolders, listFolders.size());
>
> JavaRDD test = add.map(new Function()
Which Spark release are you using ?
Can you disclose what the folder processing does (code snippet is better) ?
Thanks
On Wed, Jul 13, 2016 at 9:44 AM, Balachandar R.A.
wrote:
> Hello
>
> In one of my use cases, i need to process list of folders in parallel. I
> used
> Sc.parallelize (list,lis