You can't do nested operations on RDDs or DataFrames (i.e. you can't create a DataFrame from within a map function). Perhaps if you explain what you are trying to accomplish someone can suggest another way.
On Thu, Oct 8, 2015 at 10:10 AM, Afshartous, Nick <[email protected]> wrote: > > Hi, > > Am using Spark, 1.5 in latest EMR 4.1. > > I have an RDD of String > > scala> deviceIds > res25: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[18] at > map at <console>:28 > > and then when trying to map over the RDD while attempting to run a sql > query the result is a NullPointerException > > scala> deviceIds.map(id => sqlContext.sql("select * from > ad_info")).count() > > with the stack trace below. If I run the query as a top level expression > the count is retuned. There was additional code within > the anonymous function that's been removed to try and isolate. > > Thanks for any insights or advice on how to debug this. > -- > Nick > > > scala> deviceIds.map(id => sqlContext.sql("select * from ad_info")).count() > deviceIds.map(id => sqlContext.sql("select * from ad_info")).count() > 15/10/08 16:12:56 INFO SparkContext: Starting job: count at <console>:40 > 15/10/08 16:12:56 INFO DAGScheduler: Got job 18 (count at <console>:40) > with 200 output partitions > 15/10/08 16:12:56 INFO DAGScheduler: Final stage: ResultStage 37(count at > <console>:40) > 15/10/08 16:12:56 INFO DAGScheduler: Parents of final stage: > List(ShuffleMapStage 36) > 15/10/08 16:12:56 INFO DAGScheduler: Missing parents: List() > 15/10/08 16:12:56 INFO DAGScheduler: Submitting ResultStage 37 > (MapPartitionsRDD[37] at map at <console>:40), which has no missing parents > 15/10/08 16:12:56 INFO MemoryStore: ensureFreeSpace(17904) called with > curMem=531894, maxMem=560993402 > 15/10/08 16:12:56 INFO MemoryStore: Block broadcast_22 stored as values in > memory (estimated size 17.5 KB, free 534.5 MB) > 15/10/08 16:12:56 INFO MemoryStore: ensureFreeSpace(7143) called with > curMem=549798, maxMem=560993402 > 15/10/08 16:12:56 INFO MemoryStore: Block broadcast_22_piece0 stored as > bytes in memory (estimated size 7.0 KB, free 534.5 MB) > 15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in > memory on 10.247.0.117:33555 (size: 7.0 KB, free: 535.0 MB) > 15/10/08 16:12:56 INFO SparkContext: Created broadcast 22 from broadcast > at DAGScheduler.scala:861 > 15/10/08 16:12:56 INFO DAGScheduler: Submitting 200 missing tasks from > ResultStage 37 (MapPartitionsRDD[37] at map at <console>:40) > 15/10/08 16:12:56 INFO YarnScheduler: Adding task set 37.0 with 200 tasks > 15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.0 in stage 37.0 > (TID 649, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) > 15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.0 in stage 37.0 > (TID 650, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) > 15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in > memory on ip-10-247-0-117.ec2.internal:46227 (size: 7.0 KB, free: 535.0 MB) > 15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in > memory on ip-10-247-0-117.ec2.internal:32938 (size: 7.0 KB, free: 535.0 MB) > 15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.0 in stage 37.0 > (TID 651, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) > 15/10/08 16:12:56 WARN TaskSetManager: Lost task 0.0 in stage 37.0 (TID > 649, ip-10-247-0-117.ec2.internal): java.lang.NullPointerException > at > $line101.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40) > at > $line101.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > 15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.1 in stage 37.0 > (TID 652, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) > 15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.0 in stage 37.0 (TID > 650) on executor ip-10-247-0-117.ec2.internal: > java.lang.NullPointerException (null) [duplicate 1] > 15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.1 in stage 37.0 > (TID 653, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) > 15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.0 in stage 37.0 (TID > 651) on executor ip-10-247-0-117.ec2.internal: > java.lang.NullPointerException (null) [duplicate 2] > 15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.1 in stage 37.0 > (TID 654, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) > 15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.1 in stage 37.0 (TID > 652) on executor ip-10-247-0-117.ec2.internal: > java.lang.NullPointerException (null) [duplicate 3] > 15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.2 in stage 37.0 > (TID 655, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) > 15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.1 in stage 37.0 (TID > 653) on executor ip-10-247-0-117.ec2.internal: > java.lang.NullPointerException (null) [duplicate 4] > 15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.2 in stage 37.0 > (TID 656, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) > 15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.1 in stage 37.0 (TID > 654) on executor ip-10-247-0-117.ec2.internal: > java.lang.NullPointerException (null) [duplicate 5] > 15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.2 in stage 37.0 > (TID 657, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) > 15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.2 in stage 37.0 (TID > 655) on executor ip-10-247-0-117.ec2.internal: > java.lang.NullPointerException (null) [duplicate 6] > 15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.3 in stage 37.0 > (TID 658, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) > 15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.2 in stage 37.0 (TID > 657) on executor ip-10-247-0-117.ec2.internal: > java.lang.NullPointerException (null) [duplicate 7] > 15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.3 in stage 37.0 > (TID 659, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) > 15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.2 in stage 37.0 (TID > 656) on executor ip-10-247-0-117.ec2.internal: > java.lang.NullPointerException (null) [duplicate 8] > 15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.3 in stage 37.0 > (TID 660, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) > 15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.3 in stage 37.0 (TID > 658) on executor ip-10-247-0-117.ec2.internal: > java.lang.NullPointerException (null) [duplicate 9] > 15/10/08 16:12:56 ERROR TaskSetManager: Task 0 in stage 37.0 failed 4 > times; aborting job > 15/10/08 16:12:56 INFO YarnScheduler: Cancelling stage 37 > 15/10/08 16:12:56 INFO YarnScheduler: Stage 37 was cancelled > 15/10/08 16:12:56 INFO DAGScheduler: ResultStage 37 (count at > <console>:40) failed in 0.128 s > 15/10/08 16:12:56 INFO DAGScheduler: Job 18 failed: count at <console>:40, > took 0.145419 s > 15/10/08 16:12:56 WARN TaskSetManager: Lost task 2.3 in stage 37.0 (TID > 659, ip-10-247-0-117.ec2.internal): TaskKilled (killed intentionally) > 15/10/08 16:12:56 WARN TaskSetManager: Lost task 1.3 in stage 37.0 (TID > 660, ip-10-247-0-117.ec2.internal): TaskKilled (killed intentionally) > 15/10/08 16:12:56 INFO YarnScheduler: Removed TaskSet 37.0, whose tasks > have all completed, from pool > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 37.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 37.0 (TID 658, ip-10-247-0-117.ec2.internal): java.lang.NullPointerException > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > Driver stacktrace: > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1280) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1268) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1267) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1493) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1455) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1444) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1813) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1826) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1839) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1910) > at org.apache.spark.rdd.RDD.count(RDD.scala:1121) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:57) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:59) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:61) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:63) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:65) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:67) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:69) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:71) > at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:73) > at $iwC$$iwC$$iwC$$iwC.<init>(<console>:75) > at $iwC$$iwC$$iwC.<init>(<console>:77) > at $iwC$$iwC.<init>(<console>:79) > at $iwC.<init>(<console>:81) > at <init>(<console>:83) > at .<init>(<console>:87) > at .<clinit>(<console>) > at .<init>(<console>:7) > at .<clinit>(<console>) > at $print(<console>) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at > org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at > org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at org.apache.spark.repl.SparkILoop.org > $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at org.apache.spark.repl.SparkILoop.org > $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.NullPointerException > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > > scala> 15/10/08 16:13:45 INFO ContextCleaner: Cleaned accumulator 34 > 15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on > 10.247.0.117:33555 in memory (size: 7.0 KB, free: 535.0 MB) > 15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on > ip-10-247-0-117.ec2.internal:46227 in memory (size: 7.0 KB, free: 535.0 MB) > 15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on > ip-10-247-0-117.ec2.internal:32938 in memory (size: 7.0 KB, free: 535.0 MB) > > > > scala> > > Notice: This communication is for the intended recipient(s) only and may > contain confidential, proprietary, legally protected or privileged > information of Turbine, Inc. If you are not the intended recipient(s), > please notify the sender at once and delete this communication. > Unauthorized use of the information in this communication is strictly > prohibited and may be unlawful. For those recipients under contract with > Turbine, Inc., the information in this communication is subject to the > terms and conditions of any applicable contracts or agreements. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
