t;>>> a RDD, partitioned using the default partitioner settings (e.g. Number of
>>>> cores in your cluster).
>>>>
>>>> Each of your workers would one or more slices of data (depending on how
>>>> many cores each executor has) and the abstracti
ction is called partition.
>>>
>>> What is your use case? If you want to load the files and continue
>>> processing in parallel, then a simple .map should work.
>>> If you want to execute arbitrary code based on the list of files that
>>> each ex
of files that
>> each executor received, then you need to use .foreach that will get
>> executed for each of the entries, on the worker.
>>
>> -adrian
>>
>> From: Vinoth Sankar
>> Date: Wednesday, October 28, 2015 at 2:49 PM
>> To: "user@spark.apac
Date: Wednesday, October 28, 2015 at 2:49 PM
> To: "user@spark.apache.org"
> Subject: How do I parallize Spark Jobs at Executor Level.
>
> Hi,
>
> I'm reading and filtering large no of files using Spark. It's getting
> parallized at Spark Driver level on
of the entries, on the worker.
-adrian
From: Vinoth Sankar
Date: Wednesday, October 28, 2015 at 2:49 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: How do I parallize Spark Jobs at Executor Level.
Hi,
I'm reading and filtering large no of files usin
Hi,
I'm reading and filtering large no of files using Spark. It's getting
parallized at Spark Driver level only. How do i make it parallelize to
Executor(Worker) Level. Refer the following sample. Is there any way to
paralleling iterate the localIterator ?
Note : I use Java 1.7 version
JavaRDD f