To answer my own question, that does seem to be the right way. I was concerned about whether the data that a broadcast variable would end up getting serialized if I used it as an instance variable of the function. I realized that doesnt happen because the broadcast variable's value is marked as transient.
1. Http - https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/broadcast/HttpBroadcast.scala 2. Torrent - https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala On Thu, May 22, 2014 at 6:58 PM, Puneet Lakhina <puneet.lakh...@gmail.com>wrote: > Hi, > > Im confused on what is the right way to use broadcast variables from java. > > My code looks something like this: > > Map<> val = //build Map to be broadcast > Broadcast<Map<>> broadastVar = sc.broadcast(val); > > > sc.textFile(...).map(new SomeFunction()) { > //Do something here using broadcastVar > } > > My question is, should I pass the broadcastVar to the SomeFunction as a > constructor parameter that it can keep around as an instance variable i.e. > > sc.textFile(...).map(new SomeFunction(broadcastVar)) { > //Do something here using broadcastVar > } > > class SomeFunction extends Function<T> { > public SomeFunction(Broadcast<Map<>> var) { > this.var = var > } > > public T call() { > //Do something > } > } > > Is above the right way to utilize broadcast Variables when not using > anonymous inner classes as functions? > -- > Regards, > Puneet > > -- Regards, Puneet