Re: [jira] [Assigned] (HIVE-1772) optimize join followed by a groupby

Jie Li Tue, 06 Dec 2011 16:27:19 -0800

I happened to notice this as well.

>From the query plan Hive already considers the group-by in the first job,
so the second job is very fast. But it's still better to eliminate the
second job.


Jie

On Tue, Dec 6, 2011 at 7:04 PM, John Sichi (Assigned) (JIRA) <
j...@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/HIVE-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> John Sichi reassigned HIVE-1772:
> --------------------------------
>
>    Assignee: Navis
>
> > optimize join followed by a groupby
> > -----------------------------------
> >
> >                 Key: HIVE-1772
> >                 URL: https://issues.apache.org/jira/browse/HIVE-1772
> >             Project: Hive
> >          Issue Type: Improvement
> >          Components: Query Processor
> >            Reporter: Namit Jain
> >            Assignee: Navis
> >         Attachments: HIVE-1772.1.patch
> >
> >
> > explain SELECT x.key, count(1) FROM src1 x JOIN src y ON (x.key = y.key)
> group by x.key;
> > STAGE DEPENDENCIES:
> >   Stage-1 is a root stage
> >   Stage-2 depends on stages: Stage-1
> >   Stage-0 is a root stage
> > The above query issues 2 map-reduce jobs.
> > The first MR job performs the join, whereas the second MR performs the
> group by.
> > Since the data is already sorted, the group by can be performed in the
> reducer of the join itself.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>
>

Re: [jira] [Assigned] (HIVE-1772) optimize join followed by a groupby

Reply via email to