Hi,
Am using Spark, 1.5 in latest EMR 4.1.
I have an RDD of String
scala> deviceIds
res25: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[18] at map at
<console>:28
and then when trying to map over the RDD while attempting to run a sql query
the result is a NullPointerException
scala> deviceIds.map(id => sqlContext.sql("select * from ad_info")).count()
with the stack trace below. If I run the query as a top level expression the
count is retuned. There was additional code within
the anonymous function that's been removed to try and isolate.
Thanks for any insights or advice on how to debug this.
--
Nick
scala> deviceIds.map(id => sqlContext.sql("select * from ad_info")).count()
deviceIds.map(id => sqlContext.sql("select * from ad_info")).count()
15/10/08 16:12:56 INFO SparkContext: Starting job: count at <console>:40
15/10/08 16:12:56 INFO DAGScheduler: Got job 18 (count at <console>:40) with
200 output partitions
15/10/08 16:12:56 INFO DAGScheduler: Final stage: ResultStage 37(count at
<console>:40)
15/10/08 16:12:56 INFO DAGScheduler: Parents of final stage:
List(ShuffleMapStage 36)
15/10/08 16:12:56 INFO DAGScheduler: Missing parents: List()
15/10/08 16:12:56 INFO DAGScheduler: Submitting ResultStage 37
(MapPartitionsRDD[37] at map at <console>:40), which has no missing parents
15/10/08 16:12:56 INFO MemoryStore: ensureFreeSpace(17904) called with
curMem=531894, maxMem=560993402
15/10/08 16:12:56 INFO MemoryStore: Block broadcast_22 stored as values in
memory (estimated size 17.5 KB, free 534.5 MB)
15/10/08 16:12:56 INFO MemoryStore: ensureFreeSpace(7143) called with
curMem=549798, maxMem=560993402
15/10/08 16:12:56 INFO MemoryStore: Block broadcast_22_piece0 stored as bytes
in memory (estimated size 7.0 KB, free 534.5 MB)
15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on
10.247.0.117:33555 (size: 7.0 KB, free: 535.0 MB)
15/10/08 16:12:56 INFO SparkContext: Created broadcast 22 from broadcast at
DAGScheduler.scala:861
15/10/08 16:12:56 INFO DAGScheduler: Submitting 200 missing tasks from
ResultStage 37 (MapPartitionsRDD[37] at map at <console>:40)
15/10/08 16:12:56 INFO YarnScheduler: Adding task set 37.0 with 200 tasks
15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.0 in stage 37.0 (TID
649, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.0 in stage 37.0 (TID
650, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on
ip-10-247-0-117.ec2.internal:46227 (size: 7.0 KB, free: 535.0 MB)
15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on
ip-10-247-0-117.ec2.internal:32938 (size: 7.0 KB, free: 535.0 MB)
15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.0 in stage 37.0 (TID
651, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 WARN TaskSetManager: Lost task 0.0 in stage 37.0 (TID 649,
ip-10-247-0-117.ec2.internal): java.lang.NullPointerException
at
$line101.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
at
$line101.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.1 in stage 37.0 (TID
652, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.0 in stage 37.0 (TID 650) on
executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null)
[duplicate 1]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.1 in stage 37.0 (TID
653, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.0 in stage 37.0 (TID 651) on
executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null)
[duplicate 2]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.1 in stage 37.0 (TID
654, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.1 in stage 37.0 (TID 652) on
executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null)
[duplicate 3]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.2 in stage 37.0 (TID
655, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.1 in stage 37.0 (TID 653) on
executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null)
[duplicate 4]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.2 in stage 37.0 (TID
656, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.1 in stage 37.0 (TID 654) on
executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null)
[duplicate 5]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.2 in stage 37.0 (TID
657, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.2 in stage 37.0 (TID 655) on
executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null)
[duplicate 6]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.3 in stage 37.0 (TID
658, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.2 in stage 37.0 (TID 657) on
executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null)
[duplicate 7]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.3 in stage 37.0 (TID
659, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.2 in stage 37.0 (TID 656) on
executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null)
[duplicate 8]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.3 in stage 37.0 (TID
660, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.3 in stage 37.0 (TID 658) on
executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null)
[duplicate 9]
15/10/08 16:12:56 ERROR TaskSetManager: Task 0 in stage 37.0 failed 4 times;
aborting job
15/10/08 16:12:56 INFO YarnScheduler: Cancelling stage 37
15/10/08 16:12:56 INFO YarnScheduler: Stage 37 was cancelled
15/10/08 16:12:56 INFO DAGScheduler: ResultStage 37 (count at <console>:40)
failed in 0.128 s
15/10/08 16:12:56 INFO DAGScheduler: Job 18 failed: count at <console>:40, took
0.145419 s
15/10/08 16:12:56 WARN TaskSetManager: Lost task 2.3 in stage 37.0 (TID 659,
ip-10-247-0-117.ec2.internal): TaskKilled (killed intentionally)
15/10/08 16:12:56 WARN TaskSetManager: Lost task 1.3 in stage 37.0 (TID 660,
ip-10-247-0-117.ec2.internal): TaskKilled (killed intentionally)
15/10/08 16:12:56 INFO YarnScheduler: Removed TaskSet 37.0, whose tasks have
all completed, from pool
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 37.0 failed 4 times, most recent failure: Lost task 0.3 in stage 37.0
(TID 658, ip-10-247-0-117.ec2.internal): java.lang.NullPointerException
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1280)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1268)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1267)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1493)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1455)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1444)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1813)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1826)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1839)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1910)
at org.apache.spark.rdd.RDD.count(RDD.scala:1121)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:57)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:59)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:61)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:63)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:65)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:67)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:69)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:71)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:73)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:75)
at $iwC$$iwC$$iwC.<init>(<console>:77)
at $iwC$$iwC.<init>(<console>:79)
at $iwC.<init>(<console>:81)
at <init>(<console>:83)
at .<init>(<console>:87)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
at
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
scala> 15/10/08 16:13:45 INFO ContextCleaner: Cleaned accumulator 34
15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on
10.247.0.117:33555 in memory (size: 7.0 KB, free: 535.0 MB)
15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on
ip-10-247-0-117.ec2.internal:46227 in memory (size: 7.0 KB, free: 535.0 MB)
15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on
ip-10-247-0-117.ec2.internal:32938 in memory (size: 7.0 KB, free: 535.0 MB)
scala>
Notice: This communication is for the intended recipient(s) only and may
contain confidential, proprietary, legally protected or privileged information
of Turbine, Inc. If you are not the intended recipient(s), please notify the
sender at once and delete this communication. Unauthorized use of the
information in this communication is strictly prohibited and may be unlawful.
For those recipients under contract with Turbine, Inc., the information in this
communication is subject to the terms and conditions of any applicable
contracts or agreements.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]