Repost the code example, object testing extends Serializable { object foo { val v = 42 } val list = List(1,2,3) val rdd = sc.parallelize(list) def func = { val after = rdd.foreachPartition { it => println(foo.v) } } }
On Thu, Jul 9, 2015 at 4:09 PM, Chen Song <chen.song...@gmail.com> wrote: > Thanks Erik. I saw the document too. That is why I am confused because as > per the article, it should be good as long as *foo *is serializable. > However, what I have seen is that it would work if *testing* is > serializable, even foo is not serializable, as shown below. I don't know if > there is something specific to Spark. > > For example, the code example below works. > > object testing extends Serializable { > > object foo { > > val v = 42 > > } > > val list = List(1,2,3) > > val rdd = sc.parallelize(list) > > def func = { > > val after = rdd.foreachPartition { > > it => println(foo.v) > > } > > } > > } > > On Thu, Jul 9, 2015 at 12:11 PM, Erik Erlandson <e...@redhat.com> wrote: > >> I think you have stumbled across this idiosyncrasy: >> >> >> http://erikerlandson.github.io/blog/2015/03/31/hygienic-closures-for-scala-function-serialization/ >> >> >> >> >> ----- Original Message ----- >> > I am not sure this is more of a question for Spark or just Scala but I >> am >> > posting my question here. >> > >> > The code snippet below shows an example of passing a reference to a >> closure >> > in rdd.foreachPartition method. >> > >> > ``` >> > object testing { >> > object foo extends Serializable { >> > val v = 42 >> > } >> > val list = List(1,2,3) >> > val rdd = sc.parallelize(list) >> > def func = { >> > val after = rdd.foreachPartition { >> > it => println(foo.v) >> > } >> > } >> > } >> > ``` >> > When running this code, I got an exception >> > >> > ``` >> > Caused by: java.io.NotSerializableException: >> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$ >> > Serialization stack: >> > - object not serializable (class: >> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value: >> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824) >> > - field (class: >> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1, >> > name: $outer, type: class >> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$) >> > - object (class >> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1, >> > <function1>) >> > ``` >> > >> > It looks like Spark needs to serialize `testing` object. Why is it >> > serializing testing even though I only pass foo (another serializable >> > object) in the closure? >> > >> > A more general question is, how can I prevent Spark from serializing the >> > parent class where RDD is defined, with still support of passing in >> > function defined in other classes? >> > >> > -- >> > Chen Song >> > >> > > > > -- > Chen Song > > -- Chen Song