Hi everyone,
I see that Hive on Tez dynamically chooses the number of tasks to launch
for each vertex in the generated DAG according to cluster load (other than
data size).
For research purposes I'd like to avoid this feature since I need every
query (running on the same datasets) to be executed with the same number of
tasks, regardless of the state of the cluster (if I run query X, n tasks
have to be allocated in any case).
At this point I can't make tests with heavy workloads, so I want to ask you
if you think setting tez.am.grouping.min-size and tez.am.grouping.max-size
to the same value can do the trick, or if you have any better suggestion to
achieve this behavior.
Other than this feature, is there anything else that could change the
number of splits across different runs of the same query?

Thanks a lot

Fabio

Reply via email to