Hi,
I started with a text file(CSV) of sorted data (by first column), parsed it
into Scala objects using map operation in Scala. Then I used more maps to
add some extra info to the data and saved it as text file.
The final text file is not sorted. What do I need to do to keep the order
from the original input intact?
My code looks like:
csvFile = sc.textFile(..) //file is CSV and ordered by first column
splitRdd = csvFile map { line => line.split(",",-1) }
parsedRdd = rdd map { parts =>
{
key = parts(0) //use first column as key
value = new MyObject(parts(0), parts(1)....) //parse into scala objects
(key, value)
}
augmentedRdd = parsedRdd map { x =>
key = x._1
value = //add extra fields to x._2
(key, value)
}
augmentedRdd.saveAsFile(...) //this file is not sorted
Mohit.