Hello, everyone, I'm new to hive, and I got some questions.
I have a table like this: create table t(id int, time string, ip string, u bigint, ret int, plat int, type int, u2 bigint, ver int) PARTITIONED BY(dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' lines TERMINATED BY '\n' ; and I will do lots of query on this table base on different value of the column, like: Select count(*), count(distinct u), type from t group by type where plat=1 and dt=”2012-1-12-02” Select count(*), count(distinct u), type from t group by type where plat=2 and dt=”2012-1-12-02” Select count(*), count(distinct u), type from t where (type =2 or type =6) and dt=”2012-1-12-02” group by type; Select count(*), count(distinct u), type from t where (type =1 or type =5) and dt=”2012-1-12-02” group by type; Select count(*), count(distinct u), type from t where (type =1 or type =5) and (dt=”2012-1-12-02” and dt=”2012-1-12-03”) group by type; but these queries seems not so effective, because they query on the same table for multiple times, and that meas it will scan the same files for many times. And my question is , how can I avoid this? Is there a better way to do these queries? Thank you very much for your help!