Any thoughts?

I can explain more on problem but basically shuffle data doesn't seem to
fit in reducer memory (32GB) and I am looking ways to process them on
disk+memory.

Thanks

On Thu, Apr 28, 2016 at 10:07 AM, Nirav Patel <npa...@xactlycorp.com> wrote:

> Hi,
>
> I tried to convert a groupByKey operation to aggregateByKey in a hope to
> avoid memory and high gc issue when dealing with 200GB of data.
> I needed to create a Collection of resulting key-value pairs which
> represent all combinations of given key.
>
> My merge fun definition is as follows:
>
> private def processDataMerge(map1: collection.mutable.Map[String,
> UserDataSet],
>                                               map2:
> collection.mutable.Map[String, UserDataSet])
> : collection.mutable.Map[String, UserDataSet] = {
>
> //psuedo code
>
> map1 + map2
> (Set[combEle1], Set[combEle2] ... ) = map1.map(...extract all elements
> here)
> comb1 = cominatorics(Set[CombELe1])
> ..
> totalcombinations = comb1 + comb2 + ..
>
> map1 + totalcombinations.map(comb => (comb -> UserDataSet))
>
> }
>
>
> Output of one merge(or seq) is basically combinations of input collection
> elements and so and so on. So finally you get all combinations for given
> key.
>
> Its performing worst using aggregateByKey then groupByKey with same
> configuration. GroupByKey used to halt at last 9 partitions out of 4000.
> This one is halting even earlier. (halting due to high GC). I kill the job
> after it halts for hours on same task.
>
> I give 25GB executor memory and 4GB overhead. My cluster can't allocate
> more than 32GB per executor.
>
> I thought of custom partitioning my keys so there's less data per key and
> hence less combination. that will help with data skew but wouldn't in the
> end it would come to same thing? Like at some point it will need to merge
> key-values spread across different salt and it will come to memory issue at
> that point!
>
> Any pointer to resolve this? perhaps an external merge ?
>
> Thanks
> Nirav
>
>
>
> Thanks
>
>
>
>
>

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

<https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn] 
<https://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/Xactly>  [image: Facebook] 
<https://www.facebook.com/XactlyCorp>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

Reply via email to