query using stats

2014-10-13 Thread Navdeep Agrawal
Hi , I am trying to run query using stats with following flags as set,but it is always running map reduce job instead of getting direct result from metastore (Hive 0.13.0.2.1.2.1-471).can some please suggest me to run optimized query or a workaround it . Thanks in advance Set

Re: Query Using Stats

2014-05-16 Thread Edward Capriolo
Hive does not know that the values of column `seconds` and partition `range` or related. Hive can only use the WHERE clause to remove partitions that do not match the range criteria. All the data inside the partition is not ordered in any way so the minimum seconds and maximum seconds could be in

Re: Query Using Stats

2014-05-16 Thread Bryan Jeffrey
Prasanth, I had the correct flag enabled (see query in original email). Issue is that it does not appear to be correctly using partition stats for the calculation. Table is an orc table. It appears in the log that stats are being calculated, but does not appear to be working when queries are run a

Re: Query Using Stats

2014-05-16 Thread Prasanth Jayachandran
Bryan, The flag you are looking for is hive.compute.query.using.stats. By default this optimization is disabled. You might need to enable it to use it. Also the min/max/sum metadata are not looked up from the file but instead from metastore. Although file formats like ORC contains stats, they a

Query Using Stats

2014-05-16 Thread Bryan Jeffrey
All, I am executing the following query using Hadoop 2.2.0 and Hive 0.13.0. /opt/hadoop/latest-hive/bin/beeline -u jdbc:hive2://server:10002/database -n root --hiveconf hive.compute.query.using.stats=true -e "select min(seconds), max(seconds), range from data where range > 1400204700 group by ran