After a quick look, I don't think that the paper's <https://www.computer.org/csdl/proceedings/hipc/2016/5411/00/07839705.pdf> evaluation is very thorough. I don't see where it discusses what the PageRank implementation is doing in terms of object allocation or whether data is cached between iterations (looks like it probably isn't, based on Table III). It also doesn't address how this would interact with spark.memory.fraction. I think it would be a problem to set this threshold lower than spark.memory.fraction. And it doesn't say whether this is static or dynamic allocation.
My impression is that this is obviously a good idea for some allocation-heavy iterative workloads, but it is unclear whether it would help generally: * An empty executor may delay starting tasks because of the optimistic GC * Full GC instead of incremental may not be needed and could increase starting delay * 1-core executors will always GC between tasks * Spark-managed memory may cause long GC pauses that don't recover much space * Dynamic allocation probably eliminates most of the benefit because of executor turn-over rb On Mon, Dec 31, 2018 at 11:01 AM Reynold Xin <r...@databricks.com> wrote: > Not sure how reputable or representative that paper is... > > On Mon, Dec 31, 2018 at 10:57 AM Sean Owen <sro...@gmail.com> wrote: > >> https://github.com/apache/spark/pull/23401 >> >> Interesting PR; I thought it was not worthwhile until I saw a paper >> claiming this can speed things up to the tune of 2-6%. Has anyone >> considered this before? >> >> Sean >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> -- Ryan Blue Software Engineer Netflix