In addition to that, tried to read the same file with 3000 partitions but it used 3070 partitions. And took more time than previous please refer the attachment.
Thanks & Regards, Gokula Krishnan* (Gokul)* On Tue, Jul 25, 2017 at 8:15 AM, Gokula Krishnan D <email2...@gmail.com> wrote: > Hello All, > > I have a HDFS file with approx. *1.5 Billion records* with 500 Part files > (258.2GB Size) and when I tried to execute the following I could see that > it used 2290 tasks but it supposed to be 500 as like HDFS File, isn't it? > > val inputFile = <HDFS File> > val inputRdd = sc.textFile(inputFile) > inputRdd.count() > > I was hoping that I can do the same with the fewer partitions so tried the > following > > val inputFile = <HDFS File> > val inputrddnqew = sc.textFile(inputFile,500) > inputRddNew.count() > > But still it used 2290 tasks. > > As per scala doc, it supposed use as like the HDFS file i.e 500. > > It would be great if you could throw some insight on this. > > Thanks & Regards, > Gokula Krishnan* (Gokul)* >
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org