Hi ,

I am observing weird behavior of spark and closures in local mode on my
machine v/s a 3 node cluster (Spark 2.4.5).

Following is the piece of code

object Example {
  val num=5
  def myfunc={

  sc.parallelize(1 to 4).map(_+num).foreach(println)
}
}

I expected this to fail regardless since the local variable *num* is needed
in the closure and therefore *Example* object would need to be serialized
but it cannot be since it does not extend *Serializable* interface.

   - when I run the same piece of code from spark-shell on my same local
   machine, it fails with the error given the rationale above: [image:
   enter image description here] <https://i.stack.imgur.com/KgCRU.png>
   - When I run the same piece of code *in yarn mode* on a 3 node EMR
   cluster, it fails with the exact same error as in the above
   screenshot...given the same rationale as mentioned above.
   - when I run the same piece of code *in local mode* on a the same
   cluster (=> master node), it also fails. The same rationale still holds
   true.
   - However, this, when I run from an sbt project *(not a Spark
   installation or anything...just added Spark libraries to my sbt project and
   used a conf.master(local[..])* in local mode runs fine and gives me an
   o/p of 6,7,8,9: [image: enter image description here]
   <https://i.stack.imgur.com/yUCdp.png>

This means its running fine everywhere except when you run it by adding
Spark dependencies in your sbt project. The question is what explains the
different local mode behavior when running your Spark code by simply adding
your Spark libraries in sbt project?


Regards,

Sheel

Reply via email to