The Dataframe API should be perfectly helpful in this case. https://spark.apache.org/docs/1.3.0/sql-programming-guide.html
Some code snippet will like: val sqlContext = new org.apache.spark.sql.SQLContext(sc) // this is used to implicitly convert an RDD to a DataFrame. import sqlContext.implicits._ weathersRDD.toDF.registerTempTable("weathers") val results = sqlContext.sql("SELECT avg(minDeg), avg(maxDeg), avg(meanDeg) FROM weathers GROUP BY dayToMonth(dayOfDate))") results.collect.foreach(println) -----Original Message----- From: barisak [mailto:baris.akg...@gmail.com] Sent: Monday, April 6, 2015 10:50 PM To: user@spark.apache.org Subject: Spark Avarage Hi I have a class in above desc. case class weatherCond(dayOfdate: String, minDeg: Int, maxDeg: Int, meanDeg: Int) I am reading the data from csv file and I put this data into weatherCond class with this code val weathersRDD = sc.textFile("weather.csv").map { line => val Array(dayOfdate, minDeg, maxDeg, meanDeg) = line.replaceAll("\"","").trim.split(",") weatherCond(dayOfdate, minDeg.toInt, maxDeg.toInt, meanDeg.toInt) } the question is ; how can I average the minDeg, maxDeg and meanDeg values for each month ; The data set example day, min, max , mean 2014-03-17,-3,5,5 2014-03-18,6,7,7 2014-03-19,6,14,10 result has to be (2014-03, 3, 8.6 ,7.3) -- (Average for 2014 - 03 ) Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Avarage-tp22391.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org