Hi everyone, More of a YARN/OS question than a Spark one, but would be good to clarify this on the docs somewhere once I get an answer.
We use PySpark for all our Spark applications running on EMR. Like many users, we're accustomed to seeing the occasional ExecutorLostFailure after YARN kills a container using more memory than it was allocated. We're beginning to tune spark.yarn.executor.memoryOverhead, but before messing around with that I wanted to check if YARN is monitoring the memory usage of both the executor JVM and the spawned pyspark.daemon process or just the JVM? Inspecting things on one of the YARN nodes would seem to indicate this isn't the case since the spawned daemon gets a separate process ID and process group, but I wanted to check to confirm as it could make a big difference to pyspark users hoping to tune things. Thanks, Mike
