My use case was to read 3000 files from 3000 different HDFS directories so i was reading each file and creating RDD and adding it to array of JavaRDD then do a union(rdd...). Because of this my prog was very slow(5 minutes). After i replaced this logic with textFile(path1,path2,path3) it is working super fast(56 sec). So union() was the overhead.
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Executors-not-utilized-properly-tp7744p7785.html Sent from the Apache Spark User List mailing list archive at Nabble.com.