Re: confused by reduceByKey usage

诺铁 Thu, 17 Apr 2014 20:34:07 -0700

got it, thank you.


On Fri, Apr 18, 2014 at 9:55 AM, Cheng Lian <lian.cs....@gmail.com> wrote:

> Ah, I’m not saying println is bad, it’s just that you need to go to the
> right place to locate the output, e.g. you can check stdout of any executor
> from the Web UI.
>
>
> On Fri, Apr 18, 2014 at 9:48 AM, 诺铁 <noty...@gmail.com> wrote:
>
>> hi,Cheng,
>>
>> thank you for let me know this.   so what do you think is better way to
>> debug?
>>
>>
>> On Fri, Apr 18, 2014 at 9:27 AM, Cheng Lian <lian.cs....@gmail.com>wrote:
>>
>>> A tip: using println is only convenient when you are working with local
>>> mode. When running Spark in clustering mode (standalone/YARN/Mesos), output
>>> of println goes to executor stdout.
>>>
>>>
>>> On Fri, Apr 18, 2014 at 6:53 AM, 诺铁 <noty...@gmail.com> wrote:
>>>
>>>> yeah, I got it.!
>>>> using println to debug is great for me to explore spark.
>>>> thank you very much for your kindly help.
>>>>
>>>>
>>>>
>>>> On Fri, Apr 18, 2014 at 12:54 AM, Daniel Darabos <
>>>> daniel.dara...@lynxanalytics.com> wrote:
>>>>
>>>>> Here's a way to debug something like this:
>>>>>
>>>>> scala> d5.keyBy(_.split(" ")(0)).reduceByKey((v1,v2) => {
>>>>>            println("v1: " + v1)
>>>>>            println("v2: " + v2)
>>>>>            (v1.split(" ")(1).toInt + v2.split(" ")(1).toInt).toString
>>>>>        }).collect
>>>>>
>>>>> You get:
>>>>> v1: 1 2 3 4 5
>>>>> v2: 1 2 3 4 5
>>>>> v1: 4
>>>>> v2: 1 2 3 4 5
>>>>> java.lang.ArrayIndexOutOfBoundsException: 1
>>>>>
>>>>> reduceByKey() works kind of like regular Scala reduce(). So it will
>>>>> call the function on the first two values, then on the result of that and
>>>>> the next value, then the result of that and the next value, and so on.
>>>>> First you add 2+2 and get 4. Then your function is called with v1="4" and
>>>>> v2 is the third line.
>>>>>
>>>>> What you could do instead:
>>>>>
>>>>> scala> d5.keyBy(_.split(" ")(0)).mapValues(_.split("
>>>>> ")(1).toInt).reduceByKey((v1, v2) => v1 + v2).collect
>>>>>
>>>>>
>>>>> On Thu, Apr 17, 2014 at 6:29 PM, 诺铁 <noty...@gmail.com> wrote:
>>>>>
>>>>>> HI,
>>>>>>
>>>>>> I am new to spark,when try to write some simple tests in spark shell,
>>>>>> I met following problem.
>>>>>>
>>>>>> I create a very small text file,name it as 5.txt
>>>>>> 1 2 3 4 5
>>>>>> 1 2 3 4 5
>>>>>> 1 2 3 4 5
>>>>>>
>>>>>> and experiment in spark shell:
>>>>>>
>>>>>> scala> val d5 = sc.textFile("5.txt").cache()
>>>>>> d5: org.apache.spark.rdd.RDD[String] = MappedRDD[91] at textFile at
>>>>>> <console>:12
>>>>>>
>>>>>> scala> d5.keyBy(_.split(" ")(0)).reduceByKey((v1,v2) => (v1.split("
>>>>>> ")(1).toInt + v2.split(" ")(1).toInt).toString).first
>>>>>>
>>>>>> then error occurs:
>>>>>> 14/04/18 00:20:11 ERROR Executor: Exception in task ID 36
>>>>>> java.lang.ArrayIndexOutOfBoundsException: 1
>>>>>> at $line60.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:15)
>>>>>>  at $line60.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$2.apply(<console>:15)
>>>>>> at
>>>>>> org.apache.spark.util.collection.ExternalAppendOnlyMap$$anonfun$2.apply(ExternalAppendOnlyMap.scala:120)
>>>>>>
>>>>>> when I delete 1 line in the file, and make it 2 lines,the result is
>>>>>> correct, I don't understand what's the problem, please help me,thanks.
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: confused by reduceByKey usage

Reply via email to