Small files under SequenceFile table partition directories

2015-11-10 Thread reveen joe
Hi, Most of our Hive tables are SequenceFile tables and there are currently many small file ranging from *1-4 MB* under the Partition directories (created by insert-overwrite). I am assuming this is due to 2 reasons 1. Some of our tables are Bucketed and so individual files are created for each b

Re: Compare Query Execution duration between ORC and SequenceFile

2015-11-10 Thread reveen joe
> Hi, > > I understand that data retrieval against an ORC table can be much faster > than a SequenceFile table when a *subset of columns* are selected. > > I am assuming Query Execution duration would be faster even when *all the > columns* in a given a partition are selected but not very sure abou

Order of Partition column and Non Partition column in the WHERE clause

2015-05-19 Thread reveen joe
Hello, Would the order of partition column in the where clause matter for performance? For eg: would there be any difference in performance in the below queries? select a from table where part_column = ‘y’ and non_part_column = ‘z’ or select a from table where non_part_column = ‘z’