What version of Spark you are using, or comes with DSE 4.7? We just cannot reproduce it in Spark. yzhang@localhost>$ more test.sparkval pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs.reduceByKey((x,y) => x + y).collectyzhang@localhost>$ ~/spark/bin/spark-shell --master local -i test.sparkWelcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.3.1 /_/ Using Scala version 2.10.4Spark context available as sc.SQL context available as sqlContext.Loading test.spark...pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[0] at makeRDD at <console>:2115/08/21 09:58:51 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yesres0: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) Yong
Date: Fri, 21 Aug 2015 19:24:09 +0530 Subject: Re: Transformation not happening for reduceByKey or GroupByKey From: jsatishchan...@gmail.com To: abhis...@tetrationanalytics.com CC: user@spark.apache.org HI Abhishek, I have even tried that but rdd2 is empty Regards,Satish On Fri, Aug 21, 2015 at 6:47 PM, Abhishek R. Singh <abhis...@tetrationanalytics.com> wrote: You had: > RDD.reduceByKey((x,y) => x+y) > RDD.take(3) Maybe try: > rdd2 = RDD.reduceByKey((x,y) => x+y) > rdd2.take(3) -Abhishek- On Aug 20, 2015, at 3:05 AM, satish chandra j <jsatishchan...@gmail.com> wrote: > HI All, > I have data in RDD as mentioned below: > > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40)) > > > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on > Values for each key > > Code: > RDD.reduceByKey((x,y) => x+y) > RDD.take(3) > > Result in console: > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at > <console>:73 > res:Array[(Int,Int)] = Array() > > Command as mentioned > > dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile> > > > Please let me know what is missing in my code, as my resultant Array is empty > > > > Regards, > Satish >