Hello spark developers,

Anyone can shed some lights on the life cycle of the broadcast variables?
Basically, if I have a broadcast variable defined in a loop and for each
iteration, I provide a different value.
// For example:
for(i< 1 to 10) {
    val bc = sc.broadcast(i)
    sc.parallelize(Seq(1,2,3)).map{id => val i = bc.value; (id,
i)}.toDF("id", "i").write.parquet("/dummy_output")
}

Do I need to active manage the broadcast variable in this case? I know this
example is not real but please imagine this broadcast variable can hold an
array of 1M Long.

Regards,

Jerry



On Sun, Aug 21, 2016 at 1:07 PM, Jerry Lam <chiling...@gmail.com> wrote:

> Hello spark developers,
>
> Can someone explain to me what is the lifecycle of a broadcast variable?
> When a broadcast variable will be garbage-collected at the driver-side and
> at the executor-side? Does a spark application need to actively manage the
> broadcast variables to ensure that it will not run in OOM?
>
> Best Regards,
>
> Jerry
>

Reply via email to