Re: OOM exception during Broadcast

2016-03-07 Thread Arash
memory? > > On Mar 7, 2016, at 8:28 PM, Arash wrote: > > So I just implemented the logic through a standard join (without collect > and broadcast) and it's working great. > > The idea behind trying the broadcast was that since the other side of join > is a much larger da

Re: OOM exception during Broadcast

2016-03-07 Thread Arash
cluster > in an RDD, why would you want to collect and then re-send it as a broadcast > variable? Why not simply use the RDD that is already distributed on the > worker nodes? > > On Mar 7, 2016, at 7:44 PM, Arash wrote: > > Hi Tristan, > > This is not static, I actua

Re: OOM exception during Broadcast

2016-03-07 Thread Arash
Hi Tristan, This is not static, I actually collect it from an RDD to the driver. On Mon, Mar 7, 2016 at 5:42 PM, Tristan Nixon wrote: > Hi Arash, > > is this static data? Have you considered including it in your jars and > de-serializing it from jar on each worker node? > It’s

Re: OOM exception during Broadcast

2016-03-07 Thread Arash
s. > > Thanks, > maropu > > On Tue, Mar 8, 2016 at 9:30 AM, Arash wrote: > >> Hi Ankur, >> >> For this specific test, I'm only running the few lines of code that are >> pasted. Nothing else is cached in the cluster. >> >> Thanks, >> Ara

Re: OOM exception during Broadcast

2016-03-07 Thread Arash
Hi Ankur, For this specific test, I'm only running the few lines of code that are pasted. Nothing else is cached in the cluster. Thanks, Arash On Mon, Mar 7, 2016 at 4:07 PM, Ankur Srivastava wrote: > Hi, > > We have a use case where we broadcast ~4GB of data and we are on &

Re: OOM exception during Broadcast

2016-03-07 Thread Arash
> sense to me > > On Tue, Mar 8, 2016 at 7:29 AM, Arash wrote: > >> Hello all, >> >> I'm trying to broadcast a variable of size ~1G to a cluster of 20 nodes >> but haven't been able to make it work so far. >> >> It looks like the executors s

OOM exception during Broadcast

2016-03-07 Thread Arash
.2xlarge. The spark property maximizeResourceAllocation is set to true (executor.memory = 48G according to spark ui environment). We're also using kryo serialization and Yarn is the resource manager. Any ideas as what might be going wrong and how to debug this? Thanks, Arash