got it, thank you.
On Fri, Apr 18, 2014 at 9:55 AM, Cheng Lian <lian.cs....@gmail.com> wrote: > Ah, I’m not saying println is bad, it’s just that you need to go to the > right place to locate the output, e.g. you can check stdout of any executor > from the Web UI. > > > On Fri, Apr 18, 2014 at 9:48 AM, 诺铁 <noty...@gmail.com> wrote: > >> hi,Cheng, >> >> thank you for let me know this. so what do you think is better way to >> debug? >> >> >> On Fri, Apr 18, 2014 at 9:27 AM, Cheng Lian <lian.cs....@gmail.com>wrote: >> >>> A tip: using println is only convenient when you are working with local >>> mode. When running Spark in clustering mode (standalone/YARN/Mesos), output >>> of println goes to executor stdout. >>> >>> >>> On Fri, Apr 18, 2014 at 6:53 AM, 诺铁 <noty...@gmail.com> wrote: >>> >>>> yeah, I got it.! >>>> using println to debug is great for me to explore spark. >>>> thank you very much for your kindly help. >>>> >>>> >>>> >>>> On Fri, Apr 18, 2014 at 12:54 AM, Daniel Darabos < >>>> daniel.dara...@lynxanalytics.com> wrote: >>>> >>>>> Here's a way to debug something like this: >>>>> >>>>> scala> d5.keyBy(_.split(" ")(0)).reduceByKey((v1,v2) => { >>>>> println("v1: " + v1) >>>>> println("v2: " + v2) >>>>> (v1.split(" ")(1).toInt + v2.split(" ")(1).toInt).toString >>>>> }).collect >>>>> >>>>> You get: >>>>> v1: 1 2 3 4 5 >>>>> v2: 1 2 3 4 5 >>>>> v1: 4 >>>>> v2: 1 2 3 4 5 >>>>> java.lang.ArrayIndexOutOfBoundsException: 1 >>>>> >>>>> reduceByKey() works kind of like regular Scala reduce(). So it will >>>>> call the function on the first two values, then on the result of that and >>>>> the next value, then the result of that and the next value, and so on. >>>>> First you add 2+2 and get 4. Then your function is called with v1="4" and >>>>> v2 is the third line. >>>>> >>>>> What you could do instead: >>>>> >>>>> scala> d5.keyBy(_.split(" ")(0)).mapValues(_.split(" >>>>> ")(1).toInt).reduceByKey((v1, v2) => v1 + v2).collect >>>>> >>>>> >>>>> On Thu, Apr 17, 2014 at 6:29 PM, 诺铁 <noty...@gmail.com> wrote: >>>>> >>>>>> HI, >>>>>> >>>>>> I am new to spark,when try to write some simple tests in spark shell, >>>>>> I met following problem. >>>>>> >>>>>> I create a very small text file,name it as 5.txt >>>>>> 1 2 3 4 5 >>>>>> 1 2 3 4 5 >>>>>> 1 2 3 4 5 >>>>>> >>>>>> and experiment in spark shell: >>>>>> >>>>>> scala> val d5 = sc.textFile("5.txt").cache() >>>>>> d5: org.apache.spark.rdd.RDD[String] = MappedRDD[91] at textFile at >>>>>> <console>:12 >>>>>> >>>>>> scala> d5.keyBy(_.split(" ")(0)).reduceByKey((v1,v2) => (v1.split(" >>>>>> ")(1).toInt + v2.split(" ")(1).toInt).toString).first >>>>>> >>>>>> then error occurs: >>>>>> 14/04/18 00:20:11 ERROR Executor: Exception in task ID 36 >>>>>> java.lang.ArrayIndexOutOfBoundsException: 1 >>>>>> at $line60.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:15) >>>>>> at $line60.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:15) >>>>>> at >>>>>> org.apache.spark.util.collection.ExternalAppendOnlyMap$$anonfun$2.apply(ExternalAppendOnlyMap.scala:120) >>>>>> >>>>>> when I delete 1 line in the file, and make it 2 lines,the result is >>>>>> correct, I don't understand what's the problem, please help me,thanks. >>>>>> >>>>>> >>>>> >>>> >>> >> >