Thanks Yanbo,
I was running with 1G per executor; my file is 7.5 G, running with the
standard block size of 128M, resulting in 7500/128M= 59 partitions
naturally. My boxes have 8CPUs, so I figured they could be processing 8
tasks/partitions at a time, needing
8*(partition_size) memory per executo
The Parquet output writer allocates one block for each table partition it
is processing and writes partitions in parallel. It will run out of memory
if (number of partitions) times (Parquet block size) is greater than the
available memory. You can try to decrease the number of partitions. And
could
Hi folks, I have a strange issue. Trying to read a 7G file and do failry
simple stuff with it:
I can read the file/do simple operations on it. However, I'd prefer to
increase the number of partitions in preparation for more memory-intensive
operations (I'm happy to wait, I just need the job to com