I'm persisting a dataframe in Zeppelin which has dynamic allocation enabled
to get a sense of how much memory the dataframe takes up. After I note the
size, I unpersist the dataframe. For some reason, Yarn is not releasing the
executors that were added to Zeppelin. If I don't run the persist and
unpersist steps, the executors that were added are removed about a minute
after the paragraphs complete. Looking at the storage tab in the Spark UI
for the Zeppelin job, I don't see anything cached.

Is there any way to get Yarn to automatically remove executors after doing
a persist followed by an unpersist if there is no activity on the executor
within the configured dynamic allocation timeout (similar to how it works
without a persist/unpersist cycle) without having to set
spark.dynamicAllocation.cachedExecutorIdleTimeout? The main reason I'd like
to avoid setting that configuration is I do not want to the executors being
reclaimed if they do have cached data.

Reply via email to