Issue:
Not able to broadcast or place the files locally in the Spark worker nodes
from Spark application in Cluster deploy mode.Spark job always throws
FileNotFoundException.
Issue Description:
We are trying to access Kafka Cluster which is configured with SSL for
encryption from Spark Streami
Thank you very much Shivaram. I’ve got it working on Mac now by specifying the
namespace.
Using SparkR:::parallelize() iso just parallelize()
Wkr,
Serge
On 21 Jul 2015, at 17:20, Shivaram Venkataraman
mailto:shiva...@eecs.berkeley.edu>> wrote:
There shouldn't be anything Mac OS specific abou
There shouldn't be anything Mac OS specific about this feature. One point
of warning though -- As mentioned previously in this thread the APIs were
made private because we aren't sure we will be supporting them in the
future. If you are using these APIs it would be good to chime in on the
JIRA with
I might add to this that I've done the same exercise on Linux (CentOS 6) and
there, broadcast variables ARE working. Is this functionality perhaps not
exposed on Mac OS X? Or has it to do with the fact there are no native
Hadoop libs for Mac?
--
View this message in context:
http://apache-sp
Hi Serge,
The broadcast function was made private when SparkR merged into Apache
Spark for the 1.4.0 release. You can still use broadcast by specifying the
private namespace though.
SparkR:::broadcast(sc, obj)
The RDD methods were considered very low-level, and the SparkR devs are
still figuring
I am pasting some of the exchanges I had on this topic via the mailing list
directly so it may help someone else too. (Don't know why those responses
don't show up here).
---
Thanks Imran. It does help clarify. I believe I had it right all along then
but was confus
Thanks Imran. It does help clarify. I believe I had it right all along then
but was confused by documentation talking about never changing the
broadcasted variables.
I've tried it on a local mode process till now and does seem to work as
intended. When (and if !) we start running on a real cluster
hmm, I guess it depends on the way you look at it. In a way, I'm saying
that spark does *not* have any built in "auto-re-broadcast" if you try to
mutate a broadcast variable. Instead, you should create something new, and
just broadcast it separately. Then just have all the code you have
operatin
Hi Imran,
If I understood you correctly, you are suggesting to simply call broadcast
again from the driver program. This is exactly what I am hoping will work
as I have the Broadcast data wrapped up and I am indeed (re)broadcasting
the wrapper over again when the underlying data changes. However,
Rather than "updating" the broadcast variable, can't you simply create a
new one? When the old one can be gc'ed in your program, it will also get
gc'ed from spark's cache (and all executors).
I think this will make your code *slightly* more complicated, as you need
to add in another layer of indi
Thanks Ayan. Can we rebroadcast after updating in the driver?
Thanks
NB.
On Fri, May 15, 2015 at 6:40 PM, ayan guha wrote:
> Hi
>
> broadcast variables are shipped for the first time it is accessed in a
> transformation to the executors used by the transformation. It will NOT
> updated subsequ
Hi
broadcast variables are shipped for the first time it is accessed in a
transformation to the executors used by the transformation. It will NOT
updated subsequently, even if the value has changed. However, a new value
will be shipped to any new executor comes into play after the value has
change
Nope. It will just work when you all x.value.
On Fri, May 15, 2015 at 5:39 PM N B wrote:
> Thanks Ilya. Does one have to call broadcast again once the underlying
> data is updated in order to get the changes visible on all nodes?
>
> Thanks
> NB
>
>
> On Fri, May 15, 2015 at 5:29 PM, Ilya Ganelin
Thanks Ilya. Does one have to call broadcast again once the underlying data
is updated in order to get the changes visible on all nodes?
Thanks
NB
On Fri, May 15, 2015 at 5:29 PM, Ilya Ganelin wrote:
> The broadcast variable is like a pointer. If the underlying data changes
> then the changes
The broadcast variable is like a pointer. If the underlying data changes
then the changes will be visible throughout the cluster.
On Fri, May 15, 2015 at 5:18 PM NB wrote:
> Hello,
>
> Once a broadcast variable is created using sparkContext.broadcast(), can it
> ever be updated again? The use cas
To answer my own question, that does seem to be the right way. I was
concerned about whether the data that a broadcast variable would end up
getting serialized if I used it as an instance variable of the function. I
realized that doesnt happen because the broadcast variable's value is
marked as tra
16 matches
Mail list logo