I believe "spark-shell -i scriptFile" is there. We also use it, at least in Spark 1.3.1. "dse spark" will just wrap "spark-shell" command, underline it is just invoking "spark-shell". I don't know too much about the original problem though. Yong Date: Fri, 21 Aug 2015 18:19:49 +0800 Subject: Re: Transformation not happening for reduceByKey or GroupByKey From: zjf...@gmail.com To: jsatishchan...@gmail.com CC: robin.e...@xense.co.uk; user@spark.apache.org
Hi Satish, I don't see where spark support "-i", so suspect it is provided by DSE. In that case, it might be bug of DSE. On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j <jsatishchan...@gmail.com> wrote: HI Robin,Yes, it is DSE but issue is related to Spark only Regards,Satish Chandra On Fri, Aug 21, 2015 at 3:06 PM, Robin East <robin.e...@xense.co.uk> wrote: Not sure, never used dse - it’s part of DataStax Enterprise right? On 21 Aug 2015, at 10:07, satish chandra j <jsatishchan...@gmail.com> wrote: HI Robin,Yes, below mentioned piece or code works fine in Spark Shell but the same when place in Script File and executed with -i <file name> it creating an empty RDD scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at makeRDD at <console>:28 scala> pairs.reduceByKey((x,y) => x + y).collectres43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) Command: dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile> I understand, I am missing something here due to which my final RDD does not have as required output Regards,Satish Chandra On Thu, Aug 20, 2015 at 8:23 PM, Robin East <robin.e...@xense.co.uk> wrote: This works for me: scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at makeRDD at <console>:28 scala> pairs.reduceByKey((x,y) => x + y).collectres43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) On 20 Aug 2015, at 11:05, satish chandra j <jsatishchan...@gmail.com> wrote: HI All,I have data in RDD as mentioned below: RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40)) I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on Values for each key Code:RDD.reduceByKey((x,y) => x+y)RDD.take(3) Result in console: RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at <console>:73res:Array[(Int,Int)] = Array() Command as mentioned dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile> Please let me know what is missing in my code, as my resultant Array is empty Regards,Satish -- Best Regards Jeff Zhang