Hi,
Most of our Hive tables are SequenceFile tables and there are currently
many small file ranging from *1-4 MB* under the Partition directories
(created by insert-overwrite). I am assuming this is due to 2 reasons
1. Some of our tables are Bucketed and so individual files are created for
each b
> Hi,
>
> I understand that data retrieval against an ORC table can be much faster
> than a SequenceFile table when a *subset of columns* are selected.
>
> I am assuming Query Execution duration would be faster even when *all the
> columns* in a given a partition are selected but not very sure abou
Hello,
Would the order of partition column in the where clause matter for
performance?
For eg: would there be any difference in performance in the below queries?
select a from table where part_column = ‘y’ and non_part_column = ‘z’
or
select a from table where non_part_column = ‘z’