This question arises when Spark is offered as a managed service on a cluster of VMs in Cloud. For example, Google Dataproc <https://cloud.google.com/dataproc> or Amazon EMR <https://aws.amazon.com/emr/> among others
>From what I can see in autoscaling setup, you will always need a minimum of two worker nodes as primary. It also states and I quote "Scaling primary workers is not recommended due to HDFS limitations which result in instability while scaling. These limitations do not exist for secondary workers''. So the scaling comes with the secondary workers specifying the minimum and maximum instances. It also defaults to 2 minutes for the so-called auto scaling cooldown duration presumably to bring new executors online. My assumption is that task allocation to the new executors is FIFO for new tasks. This link <https://docs.qubole.com/en/latest/admin-guide/engine-admin/spark-admin/autoscale-spark.html#:~:text=dynamic%20allocation%20configurations.-,Autoscaling%20in%20Spark%20Clusters,scales%20down%20towards%20the%20minimum.&text=By%20default%2C%20Spark%20uses%20a%20static%20allocation%20of%20resources.> does some explanation on autoscaling. Handling Spot Node Loss in Spark Clusters When the Spark YARN Application Master (AM) receives the spot loss notification from the YARN Resource Manager (RM), it notifies the Spark driver. The driver then performs the following actions: 1. Identifies all the executors affected by the upcoming node loss. 2. Moves all of the affected executors to a decommissioning state, and no new tasks are scheduled on these executors. 3. Kills all the executors after reaching 50% of the termination time. 4. Starts the failed tasks on the remaining executors. 5. For these nodes, it removes all the entries of the shuffle data from the map output tracker on driver after reaching 90% of the termination time. This helps in preventing the shuffle-fetch failures due to spot loss. 6. Recomputes the shuffle data from the lost node by stage resubmission and at the time shuffles data of the spot node if required. My conclusion is that when a node fails classic spark comes into play and no new nodes are added even with autoscaling enabled and the failed tasks are redistributed among the existing executors. Basically autoscaling does not deal with failed nodes? view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.