Thanks Yanbo,
I was running with 1G per executor; my file is 7.5 G, running with the
standard block size of 128M, resulting in 7500/128M= 59 partitions
naturally. My boxes have 8CPUs, so I figured they could be processing 8
tasks/partitions at a time, needing
8*(partition_size) memory per executo
The Parquet output writer allocates one block for each table partition it
is processing and writes partitions in parallel. It will run out of memory
if (number of partitions) times (Parquet block size) is greater than the
available memory. You can try to decrease the number of partitions. And
could