Re: Problem with repartition/OOM

2015-09-06 Thread Yana Kadiyska
Thanks Yanbo, I was running with 1G per executor; my file is 7.5 G, running with the standard block size of 128M, resulting in 7500/128M= 59 partitions naturally. My boxes have 8CPUs, so I figured they could be processing 8 tasks/partitions at a time, needing 8*(partition_size) memory per executo

Re: Problem with repartition/OOM

2015-09-05 Thread Yanbo Liang
The Parquet output writer allocates one block for each table partition it is processing and writes partitions in parallel. It will run out of memory if (number of partitions) times (Parquet block size) is greater than the available memory. You can try to decrease the number of partitions. And could

Problem with repartition/OOM

2015-09-05 Thread Yana Kadiyska
Hi folks, I have a strange issue. Trying to read a 7G file and do failry simple stuff with it: I can read the file/do simple operations on it. However, I'd prefer to increase the number of partitions in preparation for more memory-intensive operations (I'm happy to wait, I just need the job to com