Re: Broadcast Variables

2018-06-25 Thread mrsanketh
Issue: Not able to broadcast or place the files locally in the Spark worker nodes from Spark application in Cluster deploy mode.Spark job always throws FileNotFoundException. Issue Description: We are trying to access Kafka Cluster which is configured with SSL for encryption from Spark Streami

Re: Broadcast variables in R

2015-07-22 Thread FRANCHOIS Serge
Thank you very much Shivaram. I’ve got it working on Mac now by specifying the namespace. Using SparkR:::parallelize() iso just parallelize() Wkr, Serge On 21 Jul 2015, at 17:20, Shivaram Venkataraman mailto:shiva...@eecs.berkeley.edu>> wrote: There shouldn't be anything Mac OS specific abou

Re: Broadcast variables in R

2015-07-21 Thread Shivaram Venkataraman
There shouldn't be anything Mac OS specific about this feature. One point of warning though -- As mentioned previously in this thread the APIs were made private because we aren't sure we will be supporting them in the future. If you are using these APIs it would be good to chime in on the JIRA with

Re: Broadcast variables in R

2015-07-21 Thread Serge Franchois
I might add to this that I've done the same exercise on Linux (CentOS 6) and there, broadcast variables ARE working. Is this functionality perhaps not exposed on Mac OS X? Or has it to do with the fact there are no native Hadoop libs for Mac? -- View this message in context: http://apache-sp

Re: Broadcast variables in R

2015-07-20 Thread Eskilson,Aleksander
Hi Serge, The broadcast function was made private when SparkR merged into Apache Spark for the 1.4.0 release. You can still use broadcast by specifying the private namespace though. SparkR:::broadcast(sc, obj) The RDD methods were considered very low-level, and the SparkR devs are still figuring

Re: Broadcast variables can be rebroadcast?

2015-06-03 Thread NB
I am pasting some of the exchanges I had on this topic via the mailing list directly so it may help someone else too. (Don't know why those responses don't show up here). --- Thanks Imran. It does help clarify. I believe I had it right all along then but was confus

Re: Broadcast variables can be rebroadcast?

2015-05-19 Thread N B
Thanks Imran. It does help clarify. I believe I had it right all along then but was confused by documentation talking about never changing the broadcasted variables. I've tried it on a local mode process till now and does seem to work as intended. When (and if !) we start running on a real cluster

Re: Broadcast variables can be rebroadcast?

2015-05-19 Thread Imran Rashid
hmm, I guess it depends on the way you look at it. In a way, I'm saying that spark does *not* have any built in "auto-re-broadcast" if you try to mutate a broadcast variable. Instead, you should create something new, and just broadcast it separately. Then just have all the code you have operatin

Re: Broadcast variables can be rebroadcast?

2015-05-19 Thread N B
Hi Imran, If I understood you correctly, you are suggesting to simply call broadcast again from the driver program. This is exactly what I am hoping will work as I have the Broadcast data wrapped up and I am indeed (re)broadcasting the wrapper over again when the underlying data changes. However,

Re: Broadcast variables can be rebroadcast?

2015-05-18 Thread Imran Rashid
Rather than "updating" the broadcast variable, can't you simply create a new one? When the old one can be gc'ed in your program, it will also get gc'ed from spark's cache (and all executors). I think this will make your code *slightly* more complicated, as you need to add in another layer of indi

Re: Broadcast variables can be rebroadcast?

2015-05-16 Thread N B
Thanks Ayan. Can we rebroadcast after updating in the driver? Thanks NB. On Fri, May 15, 2015 at 6:40 PM, ayan guha wrote: > Hi > > broadcast variables are shipped for the first time it is accessed in a > transformation to the executors used by the transformation. It will NOT > updated subsequ

Re: Broadcast variables can be rebroadcast?

2015-05-15 Thread ayan guha
Hi broadcast variables are shipped for the first time it is accessed in a transformation to the executors used by the transformation. It will NOT updated subsequently, even if the value has changed. However, a new value will be shipped to any new executor comes into play after the value has change

Re: Broadcast variables can be rebroadcast?

2015-05-15 Thread Ilya Ganelin
Nope. It will just work when you all x.value. On Fri, May 15, 2015 at 5:39 PM N B wrote: > Thanks Ilya. Does one have to call broadcast again once the underlying > data is updated in order to get the changes visible on all nodes? > > Thanks > NB > > > On Fri, May 15, 2015 at 5:29 PM, Ilya Ganelin

Re: Broadcast variables can be rebroadcast?

2015-05-15 Thread N B
Thanks Ilya. Does one have to call broadcast again once the underlying data is updated in order to get the changes visible on all nodes? Thanks NB On Fri, May 15, 2015 at 5:29 PM, Ilya Ganelin wrote: > The broadcast variable is like a pointer. If the underlying data changes > then the changes

Re: Broadcast variables can be rebroadcast?

2015-05-15 Thread Ilya Ganelin
The broadcast variable is like a pointer. If the underlying data changes then the changes will be visible throughout the cluster. On Fri, May 15, 2015 at 5:18 PM NB wrote: > Hello, > > Once a broadcast variable is created using sparkContext.broadcast(), can it > ever be updated again? The use cas

Re: Broadcast Variables

2014-05-27 Thread Puneet Lakhina
To answer my own question, that does seem to be the right way. I was concerned about whether the data that a broadcast variable would end up getting serialized if I used it as an instance variable of the function. I realized that doesnt happen because the broadcast variable's value is marked as tra