Partition performance

Shubhvardhan Manjayya Tue, 26 Jan 2016 20:14:27 -0800

Hi see this cloudera blog at:
http://blog.cloudera.com/blog/2014/08/improving-query-performance-using-partitioning-in-apache-hive/


That mentions "Do not over-partition the data. With too many small
partitions, the task of recursively scanning the directories becomes more
expensive than a full table scan of the table."

If I have two tables with this partition structure:
1. table1 pointing to hdfs location  /c1/c2/data
2. table2 pointing to hdfs location /c1/c2/c3/data

and hadoop fs -du -h -s /c1 has the same result for both these,

and the files are splitttable, snappy compressed,

when I compare the two queries,

select count(1) from table1;
select count(1) from table2;

For  which usecases do the two queries have different execution time? I am
guessing both should perform the same always as long as we dont use the c3
partitioned column in the where clause?

-- 

-Shubh

Partition performance

Reply via email to