Give a table with 1000 columns: col1, col2, ..., col1000 The source table is about 1PB.
I only need to query 3 columns, select col1, col2, sum(col3) as col3 from myTable group by col1, col2 Will it be advised to do a subquery first, and then send it to the aggregation of group by, so that we have smaller files sending to groupby? Not sure it Hive automatically takes care of this. select col1, col2, sum(col3) as col3 from (select col1, col2, col3 from myTable ) a group by col1, col2