Re: physical memory usage keep increasing for spark app on Yarn

2017-02-15 Thread Yang Cao
Hi Pavel! Sorry for late. I just do some investigation in these days with my colleague. Here is my thought: from spark 1.2, we use Netty with off-heap memory to reduce GC during shuffle and cache block transfer. In my case, if I try to increase the memory overhead enough. I will get the Max dir

Re: physical memory usage keep increasing for spark app on Yarn

2017-01-23 Thread Pavel Plotnikov
Hi Yang! I don't know exactly why this happen, but i think GC can't work to fast enough or size of data with additional objects created while computations to big for executor. And i found that this problem only if you make some data manipulations. You can cache you data first, after that, write in

Re: physical memory usage keep increasing for spark app on Yarn

2017-01-22 Thread Yang Cao
Also, do you know why this happen? > On 2017年1月20日, at 18:23, Pavel Plotnikov > wrote: > > Hi Yang, > i have faced with the same problem on Mesos and to circumvent this issue i am > usually increase partition number. On last step in your code you reduce > number of partitions to 1, try to set

Re: physical memory usage keep increasing for spark app on Yarn

2017-01-22 Thread Yang Cao
Hi, Thank you for your suggestion. As I know If I set to bigger number I won’t get the output number as one file, right? My task is design to combine all that small files in one day to one big parquet file. THX again. Best, > On 2017年1月20日, at 18:23, Pavel Plotnikov > wrote: > > Hi Yang, > i

Re: physical memory usage keep increasing for spark app on Yarn

2017-01-20 Thread Pavel Plotnikov
Hi Yang, i have faced with the same problem on Mesos and to circumvent this issue i am usually increase partition number. On last step in your code you reduce number of partitions to 1, try to set bigger value, may be it solve this problem. Cheers, Pavel On Fri, Jan 20, 2017 at 12:35 PM Yang Cao

physical memory usage keep increasing for spark app on Yarn

2017-01-20 Thread Yang Cao
Hi all, I am running a spark application on YARN-client mode with 6 executors (each 4 cores and executor memory = 6G and Overhead = 4G, spark version: 1.6.3 / 2.1.0). I find that my executor memory keeps increasing until get killed by node manager; and give out the info that tells me to boost