Re: Yarn containers getting killed, error 52, multiple joins

2017-04-14 Thread Rick Moritz
Potentially, with joins, you run out of memory on a single executor, because a small skew in your data is being amplified. You could try to increase the default number of partitions, reduce the number of simultaneous tasks in execution (executor.num.cores), or add a repartitioning operation before/

Re: Yarn containers getting killed, error 52, multiple joins

2017-04-13 Thread Chen, Mingrui
1.5TB is incredible high. It doesn't seem to be a configuration problem. Could you paste the code snippet doing the loop and join task on the dataset? Best regards, From: rachmaninovquartet Sent: Thursday, April 13, 2017 10:08:40 AM To: user@spark.apache.org Su