Simple, add a Debug statement in ExternalSorter.scala before the line which
throws Null and recompile Spark-assembly.jar and confirm the source of Null.


On Saturday, March 12, 2016, saurabh guru <saurabh.g...@gmail.com> wrote:

> I don't see how that would be possible. I am reading from a live stream of
> data through kafka.
>
> On Sat 12 Mar, 2016 20:28 Ted Yu, <yuzhih...@gmail.com
> <javascript:_e(%7B%7D,'cvml','yuzhih...@gmail.com');>> wrote:
>
>> Interesting.
>> If kv._1 was null, shouldn't the NPE have come from getPartition() (line
>> 105) ?
>>
>> Was it possible that records.next() returned null ?
>>
>> On Fri, Mar 11, 2016 at 11:20 PM, Prabhu Joseph <
>> prabhujose.ga...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','prabhujose.ga...@gmail.com');>> wrote:
>>
>>> Looking at ExternalSorter.scala line 192, i suspect some input record
>>> has Null key.
>>>
>>> 189  while (records.hasNext) {
>>> 190    addElementsRead()
>>> 191    kv = records.next()
>>> 192    map.changeValue((getPartition(kv._1), kv._1), update)
>>>
>>>
>>>
>>> On Sat, Mar 12, 2016 at 12:48 PM, Prabhu Joseph <
>>> prabhujose.ga...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','prabhujose.ga...@gmail.com');>> wrote:
>>>
>>>> Looking at ExternalSorter.scala line 192
>>>>
>>>> 189
>>>> while (records.hasNext) { addElementsRead() kv = records.next()
>>>> map.changeValue((getPartition(kv._1), kv._1), update)
>>>> maybeSpillCollection(usingMap = true) }
>>>>
>>>> On Sat, Mar 12, 2016 at 12:31 PM, Saurabh Guru <saurabh.g...@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','saurabh.g...@gmail.com');>> wrote:
>>>>
>>>>> I am seeing the following exception in my Spark Cluster every few days
>>>>> in production.
>>>>>
>>>>> 2016-03-12 05:30:00,541 - WARN  TaskSetManager - Lost task 0.0 in
>>>>> stage 12528.0 (TID 18792, ip-1X-1XX-1-1XX.us
>>>>> <http://ip-10-180-1-188.us>-west-1.compute.internal
>>>>> ): java.lang.NullPointerException
>>>>>        at
>>>>> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
>>>>>        at
>>>>> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
>>>>>        at
>>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>>>>>        at
>>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>>>        at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>>>>        at
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>>>>>        at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>        at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>        at java.lang.Thread.run(Thread.java:745)
>>>>>
>>>>>
>>>>> I have debugged in local machine but haven’t been able to pin point
>>>>> the cause of the error. Anyone knows why this might occur? Any 
>>>>> suggestions?
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Saurabh
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>

Reply via email to