It looks like you're trying to access an RDD ("D") from inside a closure --
the parameter to the first map) which isn't possible with the current
implementation of Spark.  Can you rephrase to not access D from inside the
map call?


On Mon, Mar 17, 2014 at 10:36 AM, anny9699 <anny9...@gmail.com> wrote:

> Hi,
>
> I met this exception when computing new RDD from an existing RDD or using
> .count on some RDDs. The following is the situation:
>
> val DD1=D.map(d => {
> (d._1,D.map(x => math.sqrt(x._2*d._2)).toArray)
> })
>
> DD is in the format RDD[(Int,Double)] and the error message is:
>
> org.apache.spark.SparkException: Job aborted: Task 14.0:8 failed more than
> 0
> times; aborting job java.lang.NullPointerException
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
>         at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
>         at
>
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
>         at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)
>
> I also met this kind of problem when using .count() on some RDDs.
>
> Thanks a lot!
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-NullPointerException-met-when-computing-new-RDD-or-use-count-tp2766.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to