Hi Sandy, I assume you are referring to caching added to datanodes via new caching api via NN ? (To preemptively mmap blocks).
I have not looked in detail, but does NN tell us about this in block locations? If yes, we can simply make those process local instead of node local for executors on that node. This would simply be a change to hadoop based rdd partitioning (what makes it tricky is to expose currently 'alive' executors to partition) Thanks Mridul On 15-May-2014 3:49 am, "Sandy Ryza (JIRA)" <j...@apache.org> wrote: > Sandy Ryza created SPARK-1767: > --------------------------------- > > Summary: Prefer HDFS-cached replicas when scheduling > data-local tasks > Key: SPARK-1767 > URL: https://issues.apache.org/jira/browse/SPARK-1767 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 1.0.0 > Reporter: Sandy Ryza > > > > > > > -- > This message was sent by Atlassian JIRA > (v6.2#6252) >