Re: Problem with repartition/OOM

2015-09-06 Thread Yana Kadiyska
Thanks Yanbo, I was running with 1G per executor; my file is 7.5 G, running with the standard block size of 128M, resulting in 7500/128M= 59 partitions naturally. My boxes have 8CPUs, so I figured they could be processing 8 tasks/partitions at a time, needing 8*(partition_size) memory per executo

Re: Problem with repartition/OOM

2015-09-05 Thread Yanbo Liang
The Parquet output writer allocates one block for each table partition it is processing and writes partitions in parallel. It will run out of memory if (number of partitions) times (Parquet block size) is greater than the available memory. You can try to decrease the number of partitions. And could