Thanks Holden for bringing this up! Maybe another thing to think about is how to make dynamic allocation more friendly with Kubernetes and disaggregated shuffle storage?
On Mon, Aug 7, 2023 at 1:27 PM Holden Karau <hol...@pigscanfly.ca> wrote: > So I wondering if there is interesting in revisiting some of how Spark is > doing it's dynamica allocation for Spark 4+? > > Some things that I've been thinking about: > > - Advisory user input (e.g. a way to say after X is done I know I need Y > where Y might be a bunch of GPU machines) > - Configurable tolerance (e.g. if we have at most Z% over target no-op) > - Past runs of same job (e.g. stage X of job Y had a peak of K) > - Faster executor launches (I'm a little fuzzy on what we can do here but, > one area for example is we setup and tear down an RPC connection to the > driver with a blocking call which does seem to have some locking inside of > the driver at first glance) > > Is this an area other folks are thinking about? Should I make an epic we > can track ideas in? Or are folks generally happy with today's dynamic > allocation (or just busy with other things)? > > -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >