I'm trying to flatten an RDD of RDDs. The straightforward approach:
a: [RDD[RDD[Int]]
a flatMap { _.collect }
throws a java.lang.NullPointerException at
org.apache.spark.rdd.RDD.collect(RDD.scala:602)
In a more complex scenario I also got:
Task not serializable: java.io.NotSerializableException:
org.apache.spark.SparkContext
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
So I guess this may be related to the context not being available inside the
map.
Are nested RDDs not supported?
Thanks,
Cosmin Radoi