Re: combitedTextFile and CombineTextInputFormat

Reynold Xin Thu, 19 May 2016 02:44:27 -0700

Users would be able to run this already with the 3 lines of code you
supplied right? In general there are a lot of methods already on
SparkContext and we lean towards the more conservative side in introducing
new API variants.


Note that this is something we are doing automatically in Spark SQL for
file sources (Dataset/DataFrame).


On Sat, May 14, 2016 at 8:13 PM, Alexander Pivovarov <apivova...@gmail.com>
wrote:

> Hello Everyone
>
> Do you think it would be useful to add combinedTextFile method (which uses
> CombineTextInputFormat) to SparkContext?
>
> It allows one task to read data from multiple text files and control
> number of RDD partitions by setting
> mapreduce.input.fileinputformat.split.maxsize
>
>
>   def combinedTextFile(sc: SparkContext)(path: String): RDD[String] = {
>     val conf = sc.hadoopConfiguration
>     sc.newAPIHadoopFile(path, classOf[CombineTextInputFormat],
> classOf[LongWritable], classOf[Text], conf).
>       map(pair => pair._2.toString).setName(path)
>   }
>
>
> Alex
>

Re: combitedTextFile and CombineTextInputFormat

Reply via email to