> If I have an orc table bucketed and sorted on a column, where does hive keep > the mapping from column value to bucket? Specifically, if I know the column > value, and need to find the specific hdfs file, is there an api to do this?
The closest to an API is ObjectInspectorUtils.getBucketNumber(). The Tez bucket pruning optimizer should be helpful, in understanding how that can be used. That prunes all other buckets for a query like "select * from table where id=?" if the table is bucketed on id. Planning side: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java#L223 Execution side: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L233 Cheers, Gopal