Repost the code example,

object testing extends Serializable {
    object foo {
      val v = 42
    }
    val list = List(1,2,3)
    val rdd = sc.parallelize(list)
    def func = {
      val after = rdd.foreachPartition {
        it => println(foo.v)
      }
    }
  }

On Thu, Jul 9, 2015 at 4:09 PM, Chen Song <chen.song...@gmail.com> wrote:

> Thanks Erik. I saw the document too. That is why I am confused because as
> per the article, it should be good as long as *foo *is serializable.
> However, what I have seen is that it would work if *testing* is
> serializable, even foo is not serializable, as shown below. I don't know if
> there is something specific to Spark.
>
> For example, the code example below works.
>
> object testing extends Serializable {
>
>     object foo {
>
>       val v = 42
>
>     }
>
>     val list = List(1,2,3)
>
>     val rdd = sc.parallelize(list)
>
>     def func = {
>
>       val after = rdd.foreachPartition {
>
>         it => println(foo.v)
>
>       }
>
>     }
>
>   }
>
> On Thu, Jul 9, 2015 at 12:11 PM, Erik Erlandson <e...@redhat.com> wrote:
>
>> I think you have stumbled across this idiosyncrasy:
>>
>>
>> http://erikerlandson.github.io/blog/2015/03/31/hygienic-closures-for-scala-function-serialization/
>>
>>
>>
>>
>> ----- Original Message -----
>> > I am not sure this is more of a question for Spark or just Scala but I
>> am
>> > posting my question here.
>> >
>> > The code snippet below shows an example of passing a reference to a
>> closure
>> > in rdd.foreachPartition method.
>> >
>> > ```
>> > object testing {
>> >     object foo extends Serializable {
>> >       val v = 42
>> >     }
>> >     val list = List(1,2,3)
>> >     val rdd = sc.parallelize(list)
>> >     def func = {
>> >       val after = rdd.foreachPartition {
>> >         it => println(foo.v)
>> >       }
>> >     }
>> >   }
>> > ```
>> > When running this code, I got an exception
>> >
>> > ```
>> > Caused by: java.io.NotSerializableException:
>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$
>> > Serialization stack:
>> > - object not serializable (class:
>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value:
>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824)
>> > - field (class:
>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
>> > name: $outer, type: class
>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$)
>> > - object (class
>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
>> > <function1>)
>> > ```
>> >
>> > It looks like Spark needs to serialize `testing` object. Why is it
>> > serializing testing even though I only pass foo (another serializable
>> > object) in the closure?
>> >
>> > A more general question is, how can I prevent Spark from serializing the
>> > parent class where RDD is defined, with still support of passing in
>> > function defined in other classes?
>> >
>> > --
>> > Chen Song
>> >
>>
>
>
>
> --
> Chen Song
>
>


-- 
Chen Song

Reply via email to