I am not sure this is more of a question for Spark or just Scala but I am posting my question here.
The code snippet below shows an example of passing a reference to a closure in rdd.foreachPartition method. ``` object testing { object foo extends Serializable { val v = 42 } val list = List(1,2,3) val rdd = sc.parallelize(list) def func = { val after = rdd.foreachPartition { it => println(foo.v) } } } ``` When running this code, I got an exception ``` Caused by: java.io.NotSerializableException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$ Serialization stack: - object not serializable (class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824) - field (class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1, name: $outer, type: class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$) - object (class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1, <function1>) ``` It looks like Spark needs to serialize `testing` object. Why is it serializing testing even though I only pass foo (another serializable object) in the closure? A more general question is, how can I prevent Spark from serializing the parent class where RDD is defined, with still support of passing in function defined in other classes? -- Chen Song