This is something that I have bumped into time and again. the object that
contains your main() should also be serializable then you won't have this
issue.
For example
object Test extends serializable{
def main(){
// set up spark context
// read your data
// create your RDD's (grouped by key)
// write out your RDD to hdfs
}
}
HTH,
Thanks
Shivani
On Mon, May 12, 2014 at 2:27 AM, yh18190 <[email protected]> wrote:
> Hi,
>
> I am facing above exception when I am trying to apply a method(ComputeDwt)
> on RDD[(Int,ArrayBuffer[(Int,Double)])] input.
> I am even using extends Serialization option to serialize objects in
> spark.Here is the code snippet.
>
> Could anyone suggest me what could be the problem and what should be done
> to
> overcome this issue.???
>
> input:series:RDD[(Int,ArrayBuffer[(Int,Double)])]
> DWTsample extends Serialization is a class having computeDwt function.
> sc: sparkContext
>
> val kk:RDD[(Int,List[Double])]=series.map(t=>(t._1,new
> DWTsample().computeDwt(sc,t._2)))
>
> Error:
> org.apache.spark.SparkException: Job failed:
> java.io.NotSerializableException: org.apache.spark.SparkContext
> org.apache.spark.SparkException: Job failed:
> java.io.NotSerializableException: org.apache.spark.SparkContext
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758)
> at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
> at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758)
> at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:556)
> at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:503)
> at
>
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:361)
> at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441)
> at
> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149)
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Job-failed-java-io-NotSerializableException-org-apache-spark-SparkContext-tp5585.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
--
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA