combitedTextFile and CombineTextInputFormat

Alexander Pivovarov Sat, 14 May 2016 20:14:47 -0700

Hello Everyone

Do you think it would be useful to add combinedTextFile method (which uses
CombineTextInputFormat) to SparkContext?


It allows one task to read data from multiple text files and control number
of RDD partitions by setting
mapreduce.input.fileinputformat.split.maxsize


  def combinedTextFile(sc: SparkContext)(path: String): RDD[String] = {
    val conf = sc.hadoopConfiguration
    sc.newAPIHadoopFile(path, classOf[CombineTextInputFormat],
classOf[LongWritable], classOf[Text], conf).
      map(pair => pair._2.toString).setName(path)
  }


Alex

combitedTextFile and CombineTextInputFormat

Reply via email to