After a quick look, I don't think that the paper's
<https://www.computer.org/csdl/proceedings/hipc/2016/5411/00/07839705.pdf>
evaluation is very thorough. I don't see where it discusses what the
PageRank implementation is doing in terms of object allocation or whether
data is cached between iterations (looks like it probably isn't, based on
Table III). It also doesn't address how this would interact with
spark.memory.fraction. I think it would be a problem to set this threshold
lower than spark.memory.fraction. And it doesn't say whether this is static
or dynamic allocation.

My impression is that this is obviously a good idea for some
allocation-heavy iterative workloads, but it is unclear whether it would
help generally:

* An empty executor may delay starting tasks because of the optimistic GC
* Full GC instead of incremental may not be needed and could increase
starting delay
* 1-core executors will always GC between tasks
* Spark-managed memory may cause long GC pauses that don't recover much
space
* Dynamic allocation probably eliminates most of the benefit because of
executor turn-over

rb

On Mon, Dec 31, 2018 at 11:01 AM Reynold Xin <r...@databricks.com> wrote:

> Not sure how reputable or representative that paper is...
>
> On Mon, Dec 31, 2018 at 10:57 AM Sean Owen <sro...@gmail.com> wrote:
>
>> https://github.com/apache/spark/pull/23401
>>
>> Interesting PR; I thought it was not worthwhile until I saw a paper
>> claiming this can speed things up to the tune of 2-6%. Has anyone
>> considered this before?
>>
>> Sean
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to