memory?
>
> On Mar 7, 2016, at 8:28 PM, Arash wrote:
>
> So I just implemented the logic through a standard join (without collect
> and broadcast) and it's working great.
>
> The idea behind trying the broadcast was that since the other side of join
> is a much larger da
cluster
> in an RDD, why would you want to collect and then re-send it as a broadcast
> variable? Why not simply use the RDD that is already distributed on the
> worker nodes?
>
> On Mar 7, 2016, at 7:44 PM, Arash wrote:
>
> Hi Tristan,
>
> This is not static, I actua
Hi Tristan,
This is not static, I actually collect it from an RDD to the driver.
On Mon, Mar 7, 2016 at 5:42 PM, Tristan Nixon wrote:
> Hi Arash,
>
> is this static data? Have you considered including it in your jars and
> de-serializing it from jar on each worker node?
> It’s
s.
>
> Thanks,
> maropu
>
> On Tue, Mar 8, 2016 at 9:30 AM, Arash wrote:
>
>> Hi Ankur,
>>
>> For this specific test, I'm only running the few lines of code that are
>> pasted. Nothing else is cached in the cluster.
>>
>> Thanks,
>> Ara
Hi Ankur,
For this specific test, I'm only running the few lines of code that are
pasted. Nothing else is cached in the cluster.
Thanks,
Arash
On Mon, Mar 7, 2016 at 4:07 PM, Ankur Srivastava wrote:
> Hi,
>
> We have a use case where we broadcast ~4GB of data and we are on
&
> sense to me
>
> On Tue, Mar 8, 2016 at 7:29 AM, Arash wrote:
>
>> Hello all,
>>
>> I'm trying to broadcast a variable of size ~1G to a cluster of 20 nodes
>> but haven't been able to make it work so far.
>>
>> It looks like the executors s
.2xlarge. The spark
property maximizeResourceAllocation is set to true (executor.memory = 48G
according to spark ui environment). We're also using kryo serialization and
Yarn is the resource manager.
Any ideas as what might be going wrong and how to debug this?
Thanks,
Arash