Users would be able to run this already with the 3 lines of code you supplied right? In general there are a lot of methods already on SparkContext and we lean towards the more conservative side in introducing new API variants.
Note that this is something we are doing automatically in Spark SQL for file sources (Dataset/DataFrame). On Sat, May 14, 2016 at 8:13 PM, Alexander Pivovarov <apivova...@gmail.com> wrote: > Hello Everyone > > Do you think it would be useful to add combinedTextFile method (which uses > CombineTextInputFormat) to SparkContext? > > It allows one task to read data from multiple text files and control > number of RDD partitions by setting > mapreduce.input.fileinputformat.split.maxsize > > > def combinedTextFile(sc: SparkContext)(path: String): RDD[String] = { > val conf = sc.hadoopConfiguration > sc.newAPIHadoopFile(path, classOf[CombineTextInputFormat], > classOf[LongWritable], classOf[Text], conf). > map(pair => pair._2.toString).setName(path) > } > > > Alex >