> If I have an orc table bucketed and sorted on a column, where does hive keep 
> the mapping from column value to bucket? Specifically, if I know the column 
> value, and need to find the specific hdfs file, is there an api to do this?

The closest to an API is ObjectInspectorUtils.getBucketNumber().

The Tez bucket pruning optimizer should be helpful, in understanding how that 
can be used.

That prunes all other buckets for a query like "select * from table where id=?" 
if the table is bucketed on id.

Planning side:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java#L223

Execution side:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L233

Cheers,
Gopal


Reply via email to