Perfect!! That makes so much sense to me now. Thanks a ton
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Executors-not-utilized-properly-tp7744p7793.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
repartition() is actually just an alias of coalesce(), but which the
shuffle flag to set to true. This shuffle is probably what you're seeing as
taking longer, but it is required when you go from a smaller number of
partitions to a larger.
When actually decreasing the number of partitions, coalesc
I found the main reason to be that i was using coalesce instead of
repartition. coalesce was shrinking the portioning so the number of tasks
were very less to be executed by all of the executors. Can you help me in
understudying when to use coalesce and when to use repartition. In
application coale
My use case was to read 3000 files from 3000 different HDFS directories so i
was reading each file and creating RDD and adding it to array of JavaRDD
then do a union(rdd...). Because of this my prog was very slow(5 minutes).
After i replaced this logic with textFile(path1,path2,path3) it is working
Hi Abhishek,
> Where mapreduce is taking 2 mins, spark is taking 5 min to complete the
job.
Interesting. Could you tell us more about your program? A "code skeleton"
would certainly be helpful.
Thanks!
-Jey
On Tue, Jun 17, 2014 at 3:21 PM, abhiguruvayya
wrote:
> I did try creating more part
I did try creating more partitions by overriding the default number of
partitions determined by HDFS splits. Problem is, in this case program will
run for ever. I have same set of inputs for map reduce and spark. Where map
reduce is taking 2 mins, spark is taking 5 min to complete the job. I
though
It sounds like your job has 9 tasks and all are executing simultaneously in
parallel. This is as good as it gets right? Are you asking how to break the
work into more tasks, like 120 to match your 10*12 cores? Make your RDD
have more partitions. For example the textFile method can override the
defa
Can some one help me with this. Any help is appreciated.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Executors-not-utilized-properly-tp7744p7753.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.