Thanks for your replies. My answers:
> You can try to increase the number of partitions to get ride of the OOM
errors. Also try to use reduceByKey instead of groupByKey.
If my operation were associative, I might be able to use fold, and if the
operation were associative+commutative, then I could
"Note: As currently implemented, groupByKey must be able to hold all the
key-value pairs for any key in memory. If a key has too many values, it can
result in an [[OutOfMemoryError]]."
Obvioulsy one of your key value pair is two large. You can try to increase
spark.shuffle.memoryFraction.
Are you
You can try to reduce the number of containers in order to increase their
memory.
2015-09-28 9:35 GMT+02:00 Akhil Das :
> You can try to increase the number of partitions to get ride of the OOM
> errors. Also try to use reduceByKey instead of groupByKey.
>
> Thanks
> Best Regards
>
> On Sat, Sep
You can try to increase the number of partitions to get ride of the OOM
errors. Also try to use reduceByKey instead of groupByKey.
Thanks
Best Regards
On Sat, Sep 26, 2015 at 1:05 AM, Elango Cheran
wrote:
> Hi everyone,
> I have an RDD of the format (user: String, timestamp: Long, state:
> Bool