I am not sure this is more of a question for Spark or just Scala but I am
posting my question here.
The code snippet below shows an example of passing a reference to a closure
in rdd.foreachPartition method.
```
object testing {
object foo extends Serializable {
val v = 42
}
val list = List(1,2,3)
val rdd = sc.parallelize(list)
def func = {
val after = rdd.foreachPartition {
it => println(foo.v)
}
}
}
```
When running this code, I got an exception
```
Caused by: java.io.NotSerializableException:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$
Serialization stack:
- object not serializable (class:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824)
- field (class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
name: $outer, type: class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$)
- object (class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
<function1>)
```
It looks like Spark needs to serialize `testing` object. Why is it
serializing testing even though I only pass foo (another serializable
object) in the closure?
A more general question is, how can I prevent Spark from serializing the
parent class where RDD is defined, with still support of passing in
function defined in other classes?
--
Chen Song