Hi Rex, as a rule of thumb I recommend configuring your TMs with as many slots as they have cores. So in your case your cluster would have 32 slots. Then depending on the workload of your jobs you should distribute them across both jobs (so that the total adds up to 32). A high number of operators does not necessarily mean that it needs more slots since operators can share the same slot. It mostly depends on the workload of your job. If the job should be too slow, then you would have to increase the cluster resources.
Cheers, Till On Fri, Nov 6, 2020 at 12:21 AM Rex Fenley <r...@remind101.com> wrote: > Hello, > > I'm running a Job on AWS EMR with the TableAPI that does a long series of > Joins, GroupBys, and Aggregates and I'd like to know how to best tune > parallelism. > > In my case, I have 8 EMR core nodes setup each with 4vCores and 8Gib of > memory. There's a job we have to run that has ~30 table operators. Given > this, how should I calculate what to set the systems parallelism to? > > I also plan on running a second job on the same system, but just with 6 > operators. Will this change the calculation for parallelism at all? > > Thanks! > > -- > > Rex Fenley | Software Engineer - Mobile and Backend > > > Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> | > FOLLOW US <https://twitter.com/remindhq> | LIKE US > <https://www.facebook.com/remindhq> >