and what version of hive are you running your test on? i do believe - not certain - that hive 0.11 includes the optimization you seek.
On Thu, Aug 1, 2013 at 10:19 AM, Chen Song <chen.song...@gmail.com> wrote: > Suppose we have 2 simple tables > > A > id int > value string > > B > id > > When hive translates the following query > > select max(A.value), A.id from A join B on A.id = A.id group by A.id; > > It launches 2 stages, one for the join and one for the group by. > > My understanding is that if the join key set is a sub set of the group by > key set, it can be achieved in the same map reduce job. If that is correct > in theory, could it be a feature in hive? > > Chen > >