Hello spark developers, Anyone can shed some lights on the life cycle of the broadcast variables? Basically, if I have a broadcast variable defined in a loop and for each iteration, I provide a different value. // For example: for(i< 1 to 10) { val bc = sc.broadcast(i) sc.parallelize(Seq(1,2,3)).map{id => val i = bc.value; (id, i)}.toDF("id", "i").write.parquet("/dummy_output") }
Do I need to active manage the broadcast variable in this case? I know this example is not real but please imagine this broadcast variable can hold an array of 1M Long. Regards, Jerry On Sun, Aug 21, 2016 at 1:07 PM, Jerry Lam <chiling...@gmail.com> wrote: > Hello spark developers, > > Can someone explain to me what is the lifecycle of a broadcast variable? > When a broadcast variable will be garbage-collected at the driver-side and > at the executor-side? Does a spark application need to actively manage the > broadcast variables to ensure that it will not run in OOM? > > Best Regards, > > Jerry >