Hi Rex,

You should configure the number of slots per TaskManager to be the number
of cores of a machine/node. In total you will then have a cluster with
#slots = #cores per machine x #machines.

If you have a cluster with 4 nodes and 8 slots each, then you have a total
of 32 slots. Now if you have a job A which you start with a parallelism of
20, then you have 12 slots left. Hence, you could make use of these 12
slots by starting a job B with a parallelism 12.

Cheers,
Till

On Fri, Nov 6, 2020 at 7:20 PM Rex Fenley <r...@remind101.com> wrote:

> Great, thanks!
>
> So just to confirm, configure # of task slots to # of core nodes x # of
> vCPUs?
>
> I'm not sure what you mean by "distribute them across both jobs (so that
> the total adds up to 32)". Is it configurable how many task slots a job can
> receive, so in this case I'd provide ~30/36 * 32 task slots for one job and
> ~6/36 * 32 for another job, but even them out to sum to 32 task slots?
>
> Thanks
>
> On Fri, Nov 6, 2020 at 10:01 AM Till Rohrmann <trohrm...@apache.org>
> wrote:
>
>> Hi Rex,
>>
>> as a rule of thumb I recommend configuring your TMs with as many slots as
>> they have cores. So in your case your cluster would have 32 slots. Then
>> depending on the workload of your jobs you should distribute them across
>> both jobs (so that the total adds up to 32). A high number of operators
>> does not necessarily mean that it needs more slots since operators can
>> share the same slot. It mostly depends on the workload of your job. If the
>> job should be too slow, then you would have to increase the cluster
>> resources.
>>
>> Cheers,
>> Till
>>
>> On Fri, Nov 6, 2020 at 12:21 AM Rex Fenley <r...@remind101.com> wrote:
>>
>>> Hello,
>>>
>>> I'm running a Job on AWS EMR with the TableAPI that does a long series
>>> of Joins, GroupBys, and Aggregates and I'd like to know how to best tune
>>> parallelism.
>>>
>>> In my case, I have 8 EMR core nodes setup each with 4vCores and 8Gib of
>>> memory. There's a job we have to run that has ~30 table operators. Given
>>> this, how should I calculate what to set the systems parallelism to?
>>>
>>> I also plan on running a second job on the same system, but just with 6
>>> operators. Will this change the calculation for parallelism at all?
>>>
>>> Thanks!
>>>
>>> --
>>>
>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>
>>>
>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>> <https://www.facebook.com/remindhq>
>>>
>>
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>  |
>  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
> <https://www.facebook.com/remindhq>
>

Reply via email to