So I have an interesting situation where a select from a view - amazingly i might add - does the right partition elimination in hive v0.80 but in hive v0.10 it does not. instead it seems to perform like a mere mortal would.
Let me explain: 1. view definition: create view v1 as select cast(year*10000 + month*100 + day as int) as date_key, * from big_table_with_partitions; 2. big_table_with_partitions is partitioned by (year int, month int, day int, hour int) 3. the query. select * from v1 where date_key=20130429 limit 10; on hive 0.80 running explain we have this: {code} | Path -> Alias: | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=00[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=01[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=02[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=03[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=04[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=05[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=06[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=07[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=08[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=09[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=10[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=11[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=12[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=13[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=14[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=15[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=16[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=17[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=18[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=19[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=20[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=21[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=22[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=23[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour] | Path -> Partition: | hdfs:// namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=00 | Partition | base file name: hour=00 | input format: org.apache.hadoop.mapred.SequenceFileInputFormat | output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | partition values: | day 29 | hour 00 | month 04 | year 2013 {code} so hive somehow was able to parse the "date_key=20130429" and figure out that was partition (year=2013/month=04/day=29). in hive version 0.10 it does not and lets just say i have several years of data and after about 30 minutes the explain comes back and every partition is listed in the output (where above it is only one day.) So i guess my question is: how was hive in v0.80 able to do that partition elimination? remember i did not specify year, month nor day which are the partition keys. thanks, Stephen.