Dear Spark Users,

Hope you are staying safe in these times of COVID.

I’m writing today to ask a question that has been danced around quite a bit on 
a combination of StackOverflow, random blog posts, and other technical forums.

When does one boost executorMemory vs. executorMemoryOverhead in Java and 
Python?


We are interested to know of any detailed resources you may have that identify 
what goes into overhead vs. main memory. We’re aware of the current 
documentation but have some observations that indicate more may being put into 
overhead than we’re led to believe.

For the specifically Python case, what would go into 
spark.executor.pyspark.memory vs. spark.executor.memoryOverhead?

We’re happy to refer to any guidance you may have on hand, but after searching 
over the Spark JIRA as well as StackOverflow posts and other resources, this 
remains a bit of a mystery given our observations.

I can think of many an occasion we observe a FetchFailedException and aren’t 
entirely certain whether to bump memoryOverhead, main memory, or PySpark memory 
specifically in the case of Python.


Thanks,

Shelby Vanhooser





// Specific evidence, if curious:

We have a job that fails when it parses very long string columns of a DataFrame 
even when main memory is boosted to 27GB and overhead is boosted to 3GB. 
However, when executorMemoryOverhead is boosted to 8GB and main memory remains 
at 6GB, the job succeeds.

My own understanding of Java in Spark is that its UDF and standard SQL-like 
operations would request memory from main memory, not from memoryOverhead, but 
this doesn’t seem to be the case as reported by the user. In the specific code 
they ran, there was no UDF and it was purely using SQL functions from Java. 
Again, this job was only fixed by bumping overhead.

We’re aware of the need to boost PySpark memory / memoryOverhead when utilizing 
Python, but I wasn’t aware of this being the case in Java, especially given the 
bare-bones SQL nature of their job.

My current leanings are that anything not directly operating in Catalyst will 
fall back to memoryOverhead, in which case we would need to budget accordingly, 
but this is worrisome that even standard operations appear to need a boosted 
memoryOverhead.

Reply via email to