You might want to look at Nephele: Efficient Parallel Data Processing in the
Cloud, Warneke & Kao, 2009
http://stratosphere.eu/assets/papers/Nephele_09.pdf
This was some of the work done in the research project with gave birth to
Flink, though this bit didn't surface as they chose to leave VM a
>
> The goal of the project is to develop an algorithm that automatically
> scales the cluster up and down based on the volume of data processed by the
> application.
By "scale the cluster up and down" do you mean:
1) adding/removing spark executors based on the load? How is that from the
dynami