subject:"Re\: how to handle OOMError from groupByKey"

Re: how to handle OOMError from groupByKey

2015-09-28 Thread Elango Cheran

Thanks for your replies. My answers: > You can try to increase the number of partitions to get ride of the OOM errors. Also try to use reduceByKey instead of groupByKey. If my operation were associative, I might be able to use fold, and if the operation were associative+commutative, then I could

Re: how to handle OOMError from groupByKey

2015-09-28 Thread Alexis Gillain

"Note: As currently implemented, groupByKey must be able to hold all the key-value pairs for any key in memory. If a key has too many values, it can result in an [[OutOfMemoryError]]." Obvioulsy one of your key value pair is two large. You can try to increase spark.shuffle.memoryFraction. Are you

Re: how to handle OOMError from groupByKey

2015-09-28 Thread Fabien Martin

You can try to reduce the number of containers in order to increase their memory. 2015-09-28 9:35 GMT+02:00 Akhil Das : > You can try to increase the number of partitions to get ride of the OOM > errors. Also try to use reduceByKey instead of groupByKey. > > Thanks > Best Regards > > On Sat, Sep

Re: how to handle OOMError from groupByKey

2015-09-28 Thread Akhil Das

You can try to increase the number of partitions to get ride of the OOM errors. Also try to use reduceByKey instead of groupByKey. Thanks Best Regards On Sat, Sep 26, 2015 at 1:05 AM, Elango Cheran wrote: > Hi everyone, > I have an RDD of the format (user: String, timestamp: Long, state: > Bool

Re: how to handle OOMError from groupByKey

Re: how to handle OOMError from groupByKey

Re: how to handle OOMError from groupByKey

Re: how to handle OOMError from groupByKey

4 matches

Site Navigation

Mail list logo

Footer information