Hi everyone, I see that Hive on Tez dynamically chooses the number of tasks to launch for each vertex in the generated DAG according to cluster load (other than data size). For research purposes I'd like to avoid this feature since I need every query (running on the same datasets) to be executed with the same number of tasks, regardless of the state of the cluster (if I run query X, n tasks have to be allocated in any case). At this point I can't make tests with heavy workloads, so I want to ask you if you think setting tez.am.grouping.min-size and tez.am.grouping.max-size to the same value can do the trick, or if you have any better suggestion to achieve this behavior. Other than this feature, is there anything else that could change the number of splits across different runs of the same query?
Thanks a lot Fabio