Re: confused by reduceByKey usage

Cheng Lian Thu, 17 Apr 2014 18:28:20 -0700

A tip: using println is only convenient when you are working with local
mode. When running Spark in clustering mode (standalone/YARN/Mesos), output
of println goes to executor stdout.



On Fri, Apr 18, 2014 at 6:53 AM, 诺铁 <noty...@gmail.com> wrote:

> yeah, I got it.!
> using println to debug is great for me to explore spark.
> thank you very much for your kindly help.
>
>
>
> On Fri, Apr 18, 2014 at 12:54 AM, Daniel Darabos <
> daniel.dara...@lynxanalytics.com> wrote:
>
>> Here's a way to debug something like this:
>>
>> scala> d5.keyBy(_.split(" ")(0)).reduceByKey((v1,v2) => {
>>            println("v1: " + v1)
>>            println("v2: " + v2)
>>            (v1.split(" ")(1).toInt + v2.split(" ")(1).toInt).toString
>>        }).collect
>>
>> You get:
>> v1: 1 2 3 4 5
>> v2: 1 2 3 4 5
>> v1: 4
>> v2: 1 2 3 4 5
>> java.lang.ArrayIndexOutOfBoundsException: 1
>>
>> reduceByKey() works kind of like regular Scala reduce(). So it will call
>> the function on the first two values, then on the result of that and the
>> next value, then the result of that and the next value, and so on. First
>> you add 2+2 and get 4. Then your function is called with v1="4" and v2 is
>> the third line.
>>
>> What you could do instead:
>>
>> scala> d5.keyBy(_.split(" ")(0)).mapValues(_.split("
>> ")(1).toInt).reduceByKey((v1, v2) => v1 + v2).collect
>>
>>
>> On Thu, Apr 17, 2014 at 6:29 PM, 诺铁 <noty...@gmail.com> wrote:
>>
>>> HI,
>>>
>>> I am new to spark,when try to write some simple tests in spark shell, I
>>> met following problem.
>>>
>>> I create a very small text file,name it as 5.txt
>>> 1 2 3 4 5
>>> 1 2 3 4 5
>>> 1 2 3 4 5
>>>
>>> and experiment in spark shell:
>>>
>>> scala> val d5 = sc.textFile("5.txt").cache()
>>> d5: org.apache.spark.rdd.RDD[String] = MappedRDD[91] at textFile at
>>> <console>:12
>>>
>>> scala> d5.keyBy(_.split(" ")(0)).reduceByKey((v1,v2) => (v1.split("
>>> ")(1).toInt + v2.split(" ")(1).toInt).toString).first
>>>
>>> then error occurs:
>>> 14/04/18 00:20:11 ERROR Executor: Exception in task ID 36
>>> java.lang.ArrayIndexOutOfBoundsException: 1
>>> at $line60.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:15)
>>>  at $line60.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:15)
>>> at
>>> org.apache.spark.util.collection.ExternalAppendOnlyMap$$anonfun$2.apply(ExternalAppendOnlyMap.scala:120)
>>>
>>> when I delete 1 line in the file, and make it 2 lines,the result is
>>> correct, I don't understand what's the problem, please help me,thanks.
>>>
>>>
>>
>

Re: confused by reduceByKey usage

Reply via email to