Minimum Split of Hadoop RDD

Deep Pradhan Fri, 08 Aug 2014 08:57:38 -0700

Hi,
I am using a single node Spark cluster on HDFS. When I was going through
the SparkPageRank.scala code, I came across the following line:


*val lines = ctx.textFile(args(0), 1)*


where, args(0) is the path of the input file from the HDFS, and the second
argument is the minimum split of Hadoop RDD (textFile in Spark
documentation).

Could anyone please tell me, how this minimum split plays a role? Can we
change it? If so, how does it effect the performance?

Thank You

Minimum Split of Hadoop RDD

Reply via email to