Thanks for your replies I solved the problem with this code
val weathersRDD = sc.textFile(csvfilePath).map {
line =>
val Array(dayOfdate, minDeg, maxDeg, meanDeg) =
line.replaceAll("\"","").trim.split(",")
Tuple2(dayOfdate.substring(0,7), (minDeg.toInt, maxDeg.toInt,
meanDeg.toInt))
}.ma
The Dataframe API should be perfectly helpful in this case.
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html
Some code snippet will like:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implici
If you're going to do it this way, I would ouput dayOfdate.substring(0,7),
i.e. the month part, and instead of weatherCond, you can use
(month,(minDeg,maxDeg,meanDeg)) --i.e. PairRDD. So weathersRDD:
RDD[(String,(Double,Double,Double))]. Then use a reduceByKey as shown in
multiple Spark examples..Y