Did you try:
val data = indexed_files.groupByKey
val *modified_data* = data.map { a =>
var name = a._2.mkString(",")
(a._1, name)
}
*modified_data*.foreach { a =>
var file = sc.textFile(a._2)
println(file.count)
}
Thanks
Best Regards
On Wed, Jul 22, 2015 at 2:18 AM, MorEru wrote:
>
I have a number of CSV files and need to combine them into a RDD by part of
their filenames.
For example, for the below files
$ ls
20140101_1.csv 20140101_3.csv 20140201_2.csv 20140301_1.csv
20140301_3.csv 20140101_2.csv 20140201_1.csv 20140201_3.csv
I need to combine files with names 2