Hi Murat,

I have dealt with EMR but have used Spark cluster on Google Dataproc with
3.1.1 with autoscaling policy.

My understanding is that autoscaling policy will decide on how to scale if
needed without manual intervention. Is this the case with yours?


HTH


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 27 Feb 2023 at 14:16, murat migdisoglu <murat.migdiso...@gmail.com>
wrote:

> Hey Mich,
> This cluster is running spark 2.4.6 on EMR
>
> On Mon, Feb 27, 2023 at 12:20 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi,
>>
>> What is the spark version and what type of cluster is it, spark on
>> dataproc or other?
>>
>> HTH
>>
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Mon, 27 Feb 2023 at 09:06, murat migdisoglu <
>> murat.migdiso...@gmail.com> wrote:
>>
>>> On an auto-scaling cluster using YARN as resource manager, we observed
>>> that when we decrease the number of worker nodes after upscaling instance
>>> types, the number of tasks for the same spark job spikes. (the total
>>> cpu/memory capacity of the cluster remains identical)
>>>
>>> the same spark job, with the same spark settings (dynamic allocation is
>>> on), spins up 4-5 times more tasks. Related to that, we see 4-5 times more
>>> executors being allocated.
>>>
>>> As far as I understand, dynamic allocation decides to start a new
>>> executor if it sees tasks pending being queued up. But I don't know why the
>>> same spark application with identical input files runs 4-5 times higher
>>> number of tasks.
>>>
>>> Any clues would be much appreciated, thank you.
>>>
>>> Murat
>>>
>>>
>
> --
> "Talkers aren’t good doers. Rest assured that we’re going there to use
> our hands, not our tongues."
> W. Shakespeare
>

Reply via email to