and what version of hive are you running your test on?  i do believe - not
certain - that hive 0.11 includes the optimization you seek.


On Thu, Aug 1, 2013 at 10:19 AM, Chen Song <chen.song...@gmail.com> wrote:

> Suppose we have 2 simple tables
>
> A
> id int
> value string
>
> B
> id
>
> When hive translates the following query
>
> select max(A.value), A.id from A join B on A.id = A.id group by A.id;
>
> It launches 2 stages, one for the join and one for the group by.
>
> My understanding is that if the join key set is a sub set of the group by
> key set, it can be achieved in the same map reduce job. If that is correct
> in theory, could it be a feature in hive?
>
> Chen
>
>

Reply via email to