Hi Rex, You should configure the number of slots per TaskManager to be the number of cores of a machine/node. In total you will then have a cluster with #slots = #cores per machine x #machines.
If you have a cluster with 4 nodes and 8 slots each, then you have a total of 32 slots. Now if you have a job A which you start with a parallelism of 20, then you have 12 slots left. Hence, you could make use of these 12 slots by starting a job B with a parallelism 12. Cheers, Till On Fri, Nov 6, 2020 at 7:20 PM Rex Fenley <r...@remind101.com> wrote: > Great, thanks! > > So just to confirm, configure # of task slots to # of core nodes x # of > vCPUs? > > I'm not sure what you mean by "distribute them across both jobs (so that > the total adds up to 32)". Is it configurable how many task slots a job can > receive, so in this case I'd provide ~30/36 * 32 task slots for one job and > ~6/36 * 32 for another job, but even them out to sum to 32 task slots? > > Thanks > > On Fri, Nov 6, 2020 at 10:01 AM Till Rohrmann <trohrm...@apache.org> > wrote: > >> Hi Rex, >> >> as a rule of thumb I recommend configuring your TMs with as many slots as >> they have cores. So in your case your cluster would have 32 slots. Then >> depending on the workload of your jobs you should distribute them across >> both jobs (so that the total adds up to 32). A high number of operators >> does not necessarily mean that it needs more slots since operators can >> share the same slot. It mostly depends on the workload of your job. If the >> job should be too slow, then you would have to increase the cluster >> resources. >> >> Cheers, >> Till >> >> On Fri, Nov 6, 2020 at 12:21 AM Rex Fenley <r...@remind101.com> wrote: >> >>> Hello, >>> >>> I'm running a Job on AWS EMR with the TableAPI that does a long series >>> of Joins, GroupBys, and Aggregates and I'd like to know how to best tune >>> parallelism. >>> >>> In my case, I have 8 EMR core nodes setup each with 4vCores and 8Gib of >>> memory. There's a job we have to run that has ~30 table operators. Given >>> this, how should I calculate what to set the systems parallelism to? >>> >>> I also plan on running a second job on the same system, but just with 6 >>> operators. Will this change the calculation for parallelism at all? >>> >>> Thanks! >>> >>> -- >>> >>> Rex Fenley | Software Engineer - Mobile and Backend >>> >>> >>> Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> >>> | FOLLOW US <https://twitter.com/remindhq> | LIKE US >>> <https://www.facebook.com/remindhq> >>> >> > > -- > > Rex Fenley | Software Engineer - Mobile and Backend > > > Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> | > FOLLOW US <https://twitter.com/remindhq> | LIKE US > <https://www.facebook.com/remindhq> >