Re: how to handle OOMError from groupByKey

2015-09-28 Thread Elango Cheran
Thanks for your replies. My answers: > You can try to increase the number of partitions to get ride of the OOM errors. Also try to use reduceByKey instead of groupByKey. If my operation were associative, I might be able to use fold, and if the operation were associative+commutative, then I could

Re: how to handle OOMError from groupByKey

2015-09-28 Thread Alexis Gillain
"Note: As currently implemented, groupByKey must be able to hold all the key-value pairs for any key in memory. If a key has too many values, it can result in an [[OutOfMemoryError]]." Obvioulsy one of your key value pair is two large. You can try to increase spark.shuffle.memoryFraction. Are you

Re: how to handle OOMError from groupByKey

2015-09-28 Thread Fabien Martin
You can try to reduce the number of containers in order to increase their memory. 2015-09-28 9:35 GMT+02:00 Akhil Das : > You can try to increase the number of partitions to get ride of the OOM > errors. Also try to use reduceByKey instead of groupByKey. > > Thanks > Best Regards > > On Sat, Sep

Re: how to handle OOMError from groupByKey

2015-09-28 Thread Akhil Das
You can try to increase the number of partitions to get ride of the OOM errors. Also try to use reduceByKey instead of groupByKey. Thanks Best Regards On Sat, Sep 26, 2015 at 1:05 AM, Elango Cheran wrote: > Hi everyone, > I have an RDD of the format (user: String, timestamp: Long, state: > Bool