spark, autoscaling and handling node loss with autoscaling

Mich Talebzadeh Sat, 05 Feb 2022 01:05:03 -0800

This question arises when Spark is offered as a managed service on a
cluster of VMs in Cloud. For example, Google Dataproc
<https://cloud.google.com/dataproc> or Amazon EMR
<https://aws.amazon.com/emr/> among others

>From what I can see in autoscaling setup, you will always need a minimum of
two worker nodes as primary. It also states and I quote "Scaling primary
workers is not recommended due to HDFS limitations which result in
instability while scaling. These limitations do not exist for secondary
workers''. So the scaling comes with the secondary workers specifying the
minimum and maximum instances. It also defaults to 2 minutes for the
so-called auto scaling cooldown duration presumably to bring new executors
online. My assumption is that task allocation to the new executors is FIFO
for new tasks. This link
<https://docs.qubole.com/en/latest/admin-guide/engine-admin/spark-admin/autoscale-spark.html#:~:text=dynamic%20allocation%20configurations.-,Autoscaling%20in%20Spark%20Clusters,scales%20down%20towards%20the%20minimum.&text=By%20default%2C%20Spark%20uses%20a%20static%20allocation%20of%20resources.>
does
some explanation on autoscaling.

Handling Spot Node Loss in Spark Clusters
When the Spark YARN Application Master (AM) receives the spot loss
notification from the YARN Resource Manager (RM), it notifies the Spark
driver. The driver then performs the following actions:

1. Identifies all the executors affected by the upcoming node loss.
2. Moves all of the affected executors to a decommissioning state, and
no new tasks are scheduled on these executors.
3. Kills all the executors after reaching 50% of the termination time.
4. Starts the failed tasks on the remaining executors.
5. For these nodes, it removes all the entries of the shuffle data from the
map output tracker on driver after reaching 90% of the termination time.
This helps in preventing the shuffle-fetch failures due to spot loss.
6. Recomputes the shuffle data from the lost node by stage resubmission
and at the time shuffles data of the spot node if required.

My conclusion is that when a node fails classic spark comes into play and
no new nodes are added even with autoscaling enabled and the failed tasks
are redistributed among the existing executors.

Basically autoscaling does not deal with failed nodes?

view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

spark, autoscaling and handling node loss with autoscaling

Reply via email to