Re: How do I parallize Spark Jobs at Executor Level.

2015-10-30 Thread Deng Ching-Mallete
Yes, it's also possible. Just pass in the sequence files you want to process as a comma-separated list in sc.sequenceFile(). -Deng On Fri, Oct 30, 2015 at 5:46 PM, Vinoth Sankar wrote: > Hi Deng. > > Thanks for the response. > > Is it possible to load sequence files parallely and process ea

Re: How do I parallize Spark Jobs at Executor Level.

2015-10-30 Thread Vinoth Sankar
Hi Deng. Thanks for the response. Is it possible to load sequence files parallely and process each of it in parallel...? Regards Vinoth Sankar On Fri, Oct 30, 2015 at 2:56 PM Deng Ching-Mallete wrote: > Hi, > > You seem to be creating a new RDD for each element in your files RDD. What > I wo

Re: How do I parallize Spark Jobs at Executor Level.

2015-10-30 Thread Deng Ching-Mallete
Hi, You seem to be creating a new RDD for each element in your files RDD. What I would suggest is to load and process only one sequence file in your Spark job, then just execute multiple spark jobs to process each sequence file. With regard to your question of where to view the logs inside the cl

Re: How do I parallize Spark Jobs at Executor Level.

2015-10-29 Thread Vinoth Sankar
Hi Adrian, Yes. I need to load all files and process it in parallel. Following code doesn't seem working(Here I used map, even tried foreach) ,I just downloading the files from HDFS to local system and printing the logs count in each file. Its not throwing any Exceptions,but not working. Files are

Re: How do I parallize Spark Jobs at Executor Level.

2015-10-28 Thread Adrian Tanase
The first line is distributing your fileList variable in the cluster as a RDD, partitioned using the default partitioner settings (e.g. Number of cores in your cluster). Each of your workers would one or more slices of data (depending on how many cores each executor has) and the abstraction is