Hi, I am using a single node Spark cluster on HDFS. When I was going through the SparkPageRank.scala code, I came across the following line:
*val lines = ctx.textFile(args(0), 1)* where, args(0) is the path of the input file from the HDFS, and the second argument is the minimum split of Hadoop RDD (textFile in Spark documentation). Could anyone please tell me, how this minimum split plays a role? Can we change it? If so, how does it effect the performance? Thank You