Yes, it's also possible. Just pass in the sequence files you want to
process as a comma-separated list in sc.sequenceFile().
-Deng
On Fri, Oct 30, 2015 at 5:46 PM, Vinoth Sankar wrote:
> Hi Deng.
>
> Thanks for the response.
>
> Is it possible to load sequence files parallely and process ea
Hi Deng.
Thanks for the response.
Is it possible to load sequence files parallely and process each of it in
parallel...?
Regards
Vinoth Sankar
On Fri, Oct 30, 2015 at 2:56 PM Deng Ching-Mallete
wrote:
> Hi,
>
> You seem to be creating a new RDD for each element in your files RDD. What
> I wo
Hi,
You seem to be creating a new RDD for each element in your files RDD. What
I would suggest is to load and process only one sequence file in your Spark
job, then just execute multiple spark jobs to process each sequence file.
With regard to your question of where to view the logs inside the cl
Hi Adrian,
Yes. I need to load all files and process it in parallel. Following code
doesn't seem working(Here I used map, even tried foreach) ,I just
downloading the files from HDFS to local system and printing the logs count
in each file. Its not throwing any Exceptions,but not working. Files are
The first line is distributing your fileList variable in the cluster as a RDD,
partitioned using the default partitioner settings (e.g. Number of cores in
your cluster).
Each of your workers would one or more slices of data (depending on how many
cores each executor has) and the abstraction is