Hi see this cloudera blog at: http://blog.cloudera.com/blog/2014/08/improving-query-performance-using-partitioning-in-apache-hive/
That mentions "Do not over-partition the data. With too many small partitions, the task of recursively scanning the directories becomes more expensive than a full table scan of the table." If I have two tables with this partition structure: 1. table1 pointing to hdfs location /c1/c2/data 2. table2 pointing to hdfs location /c1/c2/c3/data and hadoop fs -du -h -s /c1 has the same result for both these, and the files are splitttable, snappy compressed, when I compare the two queries, select count(1) from table1; select count(1) from table2; For which usecases do the two queries have different execution time? I am guessing both should perform the same always as long as we dont use the c3 partitioned column in the where clause? -- -Shubh