Hi Rex,

as a rule of thumb I recommend configuring your TMs with as many slots as
they have cores. So in your case your cluster would have 32 slots. Then
depending on the workload of your jobs you should distribute them across
both jobs (so that the total adds up to 32). A high number of operators
does not necessarily mean that it needs more slots since operators can
share the same slot. It mostly depends on the workload of your job. If the
job should be too slow, then you would have to increase the cluster
resources.

Cheers,
Till

On Fri, Nov 6, 2020 at 12:21 AM Rex Fenley <r...@remind101.com> wrote:

> Hello,
>
> I'm running a Job on AWS EMR with the TableAPI that does a long series of
> Joins, GroupBys, and Aggregates and I'd like to know how to best tune
> parallelism.
>
> In my case, I have 8 EMR core nodes setup each with 4vCores and 8Gib of
> memory. There's a job we have to run that has ~30 table operators. Given
> this, how should I calculate what to set the systems parallelism to?
>
> I also plan on running a second job on the same system, but just with 6
> operators. Will this change the calculation for parallelism at all?
>
> Thanks!
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>  |
>  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
> <https://www.facebook.com/remindhq>
>

Reply via email to