Thanks Siddharth,
at a first glimpse I couldn't find an option in hive to disable split
grouping, but I will check and eventually try the min-max setting for split
size.

Thanks a lot

Fabio

On Thu, Feb 19, 2015 at 11:02 AM, Siddharth Seth <ss...@apache.org> wrote:

> Fabio,
> One of the simplest ways to achieve this is to disable split grouping
> completely. You may end up with a large number of tasks in this case
> though. This gets rid of the dynamic split generation based on cluster
> node. (You'll have to check with Hive on how to disable this).
> Other than this, setting min/max-size to the same value should produce the
> desired results; there can be some variances in the groups generated though
> - based on the order in which HDFS gives back it's block locations.
>
>
> On Thu, Feb 19, 2015 at 1:47 AM, Fabio C. <anyte...@gmail.com> wrote:
>
>> Hi everyone,
>> I see that Hive on Tez dynamically chooses the number of tasks to launch
>> for each vertex in the generated DAG according to cluster load (other than
>> data size).
>> For research purposes I'd like to avoid this feature since I need every
>> query (running on the same datasets) to be executed with the same number of
>> tasks, regardless of the state of the cluster (if I run query X, n tasks
>> have to be allocated in any case).
>> At this point I can't make tests with heavy workloads, so I want to ask
>> you if you think setting tez.am.grouping.min-size and
>> tez.am.grouping.max-size to the same value can do the trick, or if you have
>> any better suggestion to achieve this behavior.
>> Other than this feature, is there anything else that could change the
>> number of splits across different runs of the same query?
>>
>> Thanks a lot
>>
>> Fabio
>>
>>
>

Reply via email to