Any thoughts? I can explain more on problem but basically shuffle data doesn't seem to fit in reducer memory (32GB) and I am looking ways to process them on disk+memory.
Thanks On Thu, Apr 28, 2016 at 10:07 AM, Nirav Patel <npa...@xactlycorp.com> wrote: > Hi, > > I tried to convert a groupByKey operation to aggregateByKey in a hope to > avoid memory and high gc issue when dealing with 200GB of data. > I needed to create a Collection of resulting key-value pairs which > represent all combinations of given key. > > My merge fun definition is as follows: > > private def processDataMerge(map1: collection.mutable.Map[String, > UserDataSet], > map2: > collection.mutable.Map[String, UserDataSet]) > : collection.mutable.Map[String, UserDataSet] = { > > //psuedo code > > map1 + map2 > (Set[combEle1], Set[combEle2] ... ) = map1.map(...extract all elements > here) > comb1 = cominatorics(Set[CombELe1]) > .. > totalcombinations = comb1 + comb2 + .. > > map1 + totalcombinations.map(comb => (comb -> UserDataSet)) > > } > > > Output of one merge(or seq) is basically combinations of input collection > elements and so and so on. So finally you get all combinations for given > key. > > Its performing worst using aggregateByKey then groupByKey with same > configuration. GroupByKey used to halt at last 9 partitions out of 4000. > This one is halting even earlier. (halting due to high GC). I kill the job > after it halts for hours on same task. > > I give 25GB executor memory and 4GB overhead. My cluster can't allocate > more than 32GB per executor. > > I thought of custom partitioning my keys so there's less data per key and > hence less combination. that will help with data skew but wouldn't in the > end it would come to same thing? Like at some point it will need to merge > key-values spread across different salt and it will come to memory issue at > that point! > > Any pointer to resolve this? perhaps an external merge ? > > Thanks > Nirav > > > > Thanks > > > > > -- [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] <https://twitter.com/Xactly> [image: Facebook] <https://www.facebook.com/XactlyCorp> [image: YouTube] <http://www.youtube.com/xactlycorporation>