Hi, I have a question if I could use the cluster by clause in a sub query to improve the performance of a group by query in hive
Lets I have a Table A with columns (all strings) col1..col5 and the table is not "Clustered" now I 'm trying to run the below query select > col1, > col2, > col3, > col4, > concat_ws(',', collect_set(col5)) > from A > group by > col1, > col2, > col3, > col4 Would the below query optimize the above query and if not what is the best practice to optimize this query. Assuming only col1 & col2 are the uniquely identifying columns select > ct.col1, > ct.col2, > ct.col3, > ct.col4, > concat_ws(',', collect_set(ct.col5)) > from > ( > select > col1, > col2, > col3, > col4, > col5 > from A > cluster by col1, col2 > ) ct > group by > ct.col1, > ct.col2, > ct.col3, > ct.col4. Thanks for your responses.