Give a table with 1000 columns:
    col1, col2, ..., col1000

The source table is about 1PB.

I only need to query 3 columns,

select col1, col2, sum(col3) as col3
from myTable
group by
col1, col2


Will it be advised to do a subquery first, and then send it to the
aggregation of group by, so that we have smaller files sending to groupby?
Not sure it Hive automatically takes care of this.

select col1, col2, sum(col3) as col3
from
    (select col1, col2, col3
     from myTable
    ) a
group by
col1, col2

Reply via email to