Is there a way to disable locality based container allocation for Yarn
dynamic allocation?

The issue we're running into is that we have several long running structured
streaming jobs running, all using dynamic allocation so they free up
resources in between batches. However, if we have a lot of data in a batch
(or we are restarting a job and processing the whole backlog), we end up
with very uneven cluster usage, at 100% on a few nodes and 0% on most other
nodes. This also gives us uneven disk space usage because these jobs always
write out their first block to the local drive, so we really have to use the
balancer to move a lot of data around quickly and prevent drives from
filling up.

Ideally we want to disable the locality based allocation and just have the
dynamic allocator spread the containers evenly across the cluster like you
get with non-dynamic allocation (because it doesn't know any locality
information yet). A slight increase in latency is fine for us with less
local task allocation. This might be possible at the Yarn queue level, but I
was hoping to find a way to do it at the Spark application level for
specific jobs and I couldn't find anything. Obviously when we run analytic
queries and such we want to make use of that data locality.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to