Yes, as you found, in local mode, Spark won’t serialize your objects. It will 
just pass the reference to the closure. This means that it is possible to write 
code that works in local mode, but doesn’t when you run distributed.

From: Sheel Pancholi <sheelst...@gmail.com>
Date: Friday, February 26, 2021 at 4:24 AM
To: user <user@spark.apache.org>
Subject: [EXTERNAL] Spark closures behavior in local mode in IDEs


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi ,


I am observing weird behavior of spark and closures in local mode on my machine 
v/s a 3 node cluster (Spark 2.4.5).

Following is the piece of code

object Example {

  val num=5

  def myfunc={



  sc.parallelize(1 to 4).map(_+num).foreach(println)

}

}

I expected this to fail regardless since the local variable num is needed in 
the closure and therefore Example object would need to be serialized but it 
cannot be since it does not extend Serializable interface.
·         when I run the same piece of code from spark-shell on my same local 
machine, it fails with the error given the rationale above: [Image removed by 
sender. enter image description here] <https://i.stack.imgur.com/KgCRU.png>
·         When I run the same piece of code in yarn mode on a 3 node EMR 
cluster, it fails with the exact same error as in the above screenshot...given 
the same rationale as mentioned above.
·         when I run the same piece of code in local mode on a the same cluster 
(=> master node), it also fails. The same rationale still holds true.
·         However, this, when I run from an sbt project (not a Spark 
installation or anything...just added Spark libraries to my sbt project and 
used a conf.master(local[..]) in local mode runs fine and gives me an o/p of 
6,7,8,9: [Image removed by sender. enter image description here] 
<https://i.stack.imgur.com/yUCdp.png>

This means its running fine everywhere except when you run it by adding Spark 
dependencies in your sbt project. The question is what explains the different 
local mode behavior when running your Spark code by simply adding your Spark 
libraries in sbt project?



Regards,

Sheel

Reply via email to