Re: Spark Avarage

2015-04-06 Thread baris akgun
Thanks for your replies I solved the problem with this code val weathersRDD = sc.textFile(csvfilePath).map { line => val Array(dayOfdate, minDeg, maxDeg, meanDeg) = line.replaceAll("\"","").trim.split(",") Tuple2(dayOfdate.substring(0,7), (minDeg.toInt, maxDeg.toInt, meanDeg.toInt)) }.ma

RE: Spark Avarage

2015-04-06 Thread Cheng, Hao
The Dataframe API should be perfectly helpful in this case. https://spark.apache.org/docs/1.3.0/sql-programming-guide.html Some code snippet will like: val sqlContext = new org.apache.spark.sql.SQLContext(sc) // this is used to implicitly convert an RDD to a DataFrame. import sqlContext.implici

Re: Spark Avarage

2015-04-06 Thread Yana Kadiyska
If you're going to do it this way, I would ouput dayOfdate.substring(0,7), i.e. the month part, and instead of weatherCond, you can use (month,(minDeg,maxDeg,meanDeg)) --i.e. PairRDD. So weathersRDD: RDD[(String,(Double,Double,Double))]. Then use a reduceByKey as shown in multiple Spark examples..Y