It looks like you're trying to access an RDD ("D") from inside a closure -- the parameter to the first map) which isn't possible with the current implementation of Spark. Can you rephrase to not access D from inside the map call?
On Mon, Mar 17, 2014 at 10:36 AM, anny9699 <anny9...@gmail.com> wrote: > Hi, > > I met this exception when computing new RDD from an existing RDD or using > .count on some RDDs. The following is the situation: > > val DD1=D.map(d => { > (d._1,D.map(x => math.sqrt(x._2*d._2)).toArray) > }) > > DD is in the format RDD[(Int,Double)] and the error message is: > > org.apache.spark.SparkException: Job aborted: Task 14.0:8 failed more than > 0 > times; aborting job java.lang.NullPointerException > at > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827) > at > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825) > at > > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60) > at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825) > at > > org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440) > at > org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502) > at > org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157) > > I also met this kind of problem when using .count() on some RDDs. > > Thanks a lot! > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-NullPointerException-met-when-computing-new-RDD-or-use-count-tp2766.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >