Re: flatMap followed by mapPartitions

2014-11-14 Thread Debasish Das
mapPartitions tried to hold data is memory which did not work for me.. I am doing flatMap followed by groupByKey now with HashPartitioner and number of blocks is 60 (Based on 120 cores I am running the job on)... Now when the shuffle size < 100 GB it works fine...as flatMap shuffle goes to 200 GB

Re: flatMap followed by mapPartitions

2014-11-12 Thread Mayur Rustagi
flatmap would have to shuffle data only if output RDD is expected to be partitioned by some key. RDD[X].flatmap(X=>RDD[Y]) If it has to shuffle it should be local. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Thu, Nov 1