The Dataframe API should be perfectly helpful in this case.  
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html

Some code snippet will like:

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._
weathersRDD.toDF.registerTempTable("weathers")
val results = sqlContext.sql("SELECT avg(minDeg), avg(maxDeg), avg(meanDeg) 
FROM weathers GROUP BY dayToMonth(dayOfDate))")
results.collect.foreach(println)


-----Original Message-----
From: barisak [mailto:baris.akg...@gmail.com] 
Sent: Monday, April 6, 2015 10:50 PM
To: user@spark.apache.org
Subject: Spark Avarage

Hi 

I have a class in above desc.

case class weatherCond(dayOfdate: String, minDeg: Int, maxDeg: Int, meanDeg:
Int)

I am reading the data from csv file and I put this data into weatherCond class 
with this code 

val weathersRDD = sc.textFile("weather.csv").map {
      line =>
        val Array(dayOfdate, minDeg, maxDeg, meanDeg) =
line.replaceAll("\"","").trim.split(",")
        weatherCond(dayOfdate, minDeg.toInt, maxDeg.toInt, meanDeg.toInt)
    }

the question is ; how can I average the minDeg, maxDeg and meanDeg values for 
each month ; 

The data set example 

day, min, max , mean
2014-03-17,-3,5,5
2014-03-18,6,7,7
2014-03-19,6,14,10

result has to be (2014-03,   3,   8.6   ,7.3)     -- (Average for 2014 - 03
)

Thanks





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Avarage-tp22391.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to