yeah, I got it.! using println to debug is great for me to explore spark. thank you very much for your kindly help.
On Fri, Apr 18, 2014 at 12:54 AM, Daniel Darabos < daniel.dara...@lynxanalytics.com> wrote: > Here's a way to debug something like this: > > scala> d5.keyBy(_.split(" ")(0)).reduceByKey((v1,v2) => { > println("v1: " + v1) > println("v2: " + v2) > (v1.split(" ")(1).toInt + v2.split(" ")(1).toInt).toString > }).collect > > You get: > v1: 1 2 3 4 5 > v2: 1 2 3 4 5 > v1: 4 > v2: 1 2 3 4 5 > java.lang.ArrayIndexOutOfBoundsException: 1 > > reduceByKey() works kind of like regular Scala reduce(). So it will call > the function on the first two values, then on the result of that and the > next value, then the result of that and the next value, and so on. First > you add 2+2 and get 4. Then your function is called with v1="4" and v2 is > the third line. > > What you could do instead: > > scala> d5.keyBy(_.split(" ")(0)).mapValues(_.split(" > ")(1).toInt).reduceByKey((v1, v2) => v1 + v2).collect > > > On Thu, Apr 17, 2014 at 6:29 PM, 诺铁 <noty...@gmail.com> wrote: > >> HI, >> >> I am new to spark,when try to write some simple tests in spark shell, I >> met following problem. >> >> I create a very small text file,name it as 5.txt >> 1 2 3 4 5 >> 1 2 3 4 5 >> 1 2 3 4 5 >> >> and experiment in spark shell: >> >> scala> val d5 = sc.textFile("5.txt").cache() >> d5: org.apache.spark.rdd.RDD[String] = MappedRDD[91] at textFile at >> <console>:12 >> >> scala> d5.keyBy(_.split(" ")(0)).reduceByKey((v1,v2) => (v1.split(" >> ")(1).toInt + v2.split(" ")(1).toInt).toString).first >> >> then error occurs: >> 14/04/18 00:20:11 ERROR Executor: Exception in task ID 36 >> java.lang.ArrayIndexOutOfBoundsException: 1 >> at $line60.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:15) >> at $line60.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:15) >> at >> org.apache.spark.util.collection.ExternalAppendOnlyMap$$anonfun$2.apply(ExternalAppendOnlyMap.scala:120) >> >> when I delete 1 line in the file, and make it 2 lines,the result is >> correct, I don't understand what's the problem, please help me,thanks. >> >> >