Hi, What is the spark version and what type of cluster is it, spark on dataproc or other?
HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Mon, 27 Feb 2023 at 09:06, murat migdisoglu <murat.migdiso...@gmail.com> wrote: > On an auto-scaling cluster using YARN as resource manager, we observed > that when we decrease the number of worker nodes after upscaling instance > types, the number of tasks for the same spark job spikes. (the total > cpu/memory capacity of the cluster remains identical) > > the same spark job, with the same spark settings (dynamic allocation is > on), spins up 4-5 times more tasks. Related to that, we see 4-5 times more > executors being allocated. > > As far as I understand, dynamic allocation decides to start a new executor > if it sees tasks pending being queued up. But I don't know why the same > spark application with identical input files runs 4-5 times higher number > of tasks. > > Any clues would be much appreciated, thank you. > > Murat > >