Hello, I'm running a Job on AWS EMR with the TableAPI that does a long series of Joins, GroupBys, and Aggregates and I'd like to know how to best tune parallelism.
In my case, I have 8 EMR core nodes setup each with 4vCores and 8Gib of memory. There's a job we have to run that has ~30 table operators. Given this, how should I calculate what to set the systems parallelism to? I also plan on running a second job on the same system, but just with 6 operators. Will this change the calculation for parallelism at all? Thanks! -- Rex Fenley | Software Engineer - Mobile and Backend Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> | FOLLOW US <https://twitter.com/remindhq> | LIKE US <https://www.facebook.com/remindhq>