Simple, add a Debug statement in ExternalSorter.scala before the line which throws Null and recompile Spark-assembly.jar and confirm the source of Null.
On Saturday, March 12, 2016, saurabh guru <saurabh.g...@gmail.com> wrote: > I don't see how that would be possible. I am reading from a live stream of > data through kafka. > > On Sat 12 Mar, 2016 20:28 Ted Yu, <yuzhih...@gmail.com > <javascript:_e(%7B%7D,'cvml','yuzhih...@gmail.com');>> wrote: > >> Interesting. >> If kv._1 was null, shouldn't the NPE have come from getPartition() (line >> 105) ? >> >> Was it possible that records.next() returned null ? >> >> On Fri, Mar 11, 2016 at 11:20 PM, Prabhu Joseph < >> prabhujose.ga...@gmail.com >> <javascript:_e(%7B%7D,'cvml','prabhujose.ga...@gmail.com');>> wrote: >> >>> Looking at ExternalSorter.scala line 192, i suspect some input record >>> has Null key. >>> >>> 189 while (records.hasNext) { >>> 190 addElementsRead() >>> 191 kv = records.next() >>> 192 map.changeValue((getPartition(kv._1), kv._1), update) >>> >>> >>> >>> On Sat, Mar 12, 2016 at 12:48 PM, Prabhu Joseph < >>> prabhujose.ga...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','prabhujose.ga...@gmail.com');>> wrote: >>> >>>> Looking at ExternalSorter.scala line 192 >>>> >>>> 189 >>>> while (records.hasNext) { addElementsRead() kv = records.next() >>>> map.changeValue((getPartition(kv._1), kv._1), update) >>>> maybeSpillCollection(usingMap = true) } >>>> >>>> On Sat, Mar 12, 2016 at 12:31 PM, Saurabh Guru <saurabh.g...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','saurabh.g...@gmail.com');>> wrote: >>>> >>>>> I am seeing the following exception in my Spark Cluster every few days >>>>> in production. >>>>> >>>>> 2016-03-12 05:30:00,541 - WARN TaskSetManager - Lost task 0.0 in >>>>> stage 12528.0 (TID 18792, ip-1X-1XX-1-1XX.us >>>>> <http://ip-10-180-1-188.us>-west-1.compute.internal >>>>> ): java.lang.NullPointerException >>>>> at >>>>> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192) >>>>> at >>>>> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64) >>>>> at >>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) >>>>> at >>>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>>>> at org.apache.spark.scheduler.Task.run(Task.scala:89) >>>>> at >>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> >>>>> >>>>> I have debugged in local machine but haven’t been able to pin point >>>>> the cause of the error. Anyone knows why this might occur? Any >>>>> suggestions? >>>>> >>>>> >>>>> Thanks, >>>>> Saurabh >>>>> >>>>> >>>>> >>>>> >>>> >>> >>