I am not sure this is more of a question for Spark or just Scala but I am
posting my question here.

The code snippet below shows an example of passing a reference to a closure
in rdd.foreachPartition method.

```
object testing {
    object foo extends Serializable {
      val v = 42
    }
    val list = List(1,2,3)
    val rdd = sc.parallelize(list)
    def func = {
      val after = rdd.foreachPartition {
        it => println(foo.v)
      }
    }
  }
```
When running this code, I got an exception

```
Caused by: java.io.NotSerializableException:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$
Serialization stack:
- object not serializable (class:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824)
- field (class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
name: $outer, type: class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$)
- object (class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
<function1>)
```

It looks like Spark needs to serialize `testing` object. Why is it
serializing testing even though I only pass foo (another serializable
object) in the closure?

A more general question is, how can I prevent Spark from serializing the
parent class where RDD is defined, with still support of passing in
function defined in other classes?

-- 
Chen Song

Reply via email to