Is there a way to disable locality based container allocation for Yarn dynamic allocation?
The issue we're running into is that we have several long running structured streaming jobs running, all using dynamic allocation so they free up resources in between batches. However, if we have a lot of data in a batch (or we are restarting a job and processing the whole backlog), we end up with very uneven cluster usage, at 100% on a few nodes and 0% on most other nodes. This also gives us uneven disk space usage because these jobs always write out their first block to the local drive, so we really have to use the balancer to move a lot of data around quickly and prevent drives from filling up. Ideally we want to disable the locality based allocation and just have the dynamic allocator spread the containers evenly across the cluster like you get with non-dynamic allocation (because it doesn't know any locality information yet). A slight increase in latency is fine for us with less local task allocation. This might be possible at the Yarn queue level, but I was hoping to find a way to do it at the Spark application level for specific jobs and I couldn't find anything. Obviously when we run analytic queries and such we want to make use of that data locality. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org