> So I am questioning whether it is enabled on the version I am on, which >is 0.14. Does anyone know?
https://issues.apache.org/jira/browse/HIVE-9188 - fix-version (1.2.0) The version you are using does not have bloom filter support. It should be ignoring the parameter and not generating any bloom filter streams, when writing. hive --orcfiledump (in later versions) will print the BLOOM_FILTER as a column next to the row index streams. > Without any optimization, I have to use thousands of mappers to find >just one id. Everything else you are doing is appropriate, however be aware that the bloom filter index (& row-index) is consulted only *after* a mapper starts up. So it might still spin up a mapper, but it might exit immediately, which plays well into Tez container reuse for very busy clusters - in fact, it might be faster in a busy cluster than a completely idle one. The sorted[1] min-max indicators suggested by Prasanth however are actually rolled up to the split-level & can be used to prune splits before being scheduled. Cheers, Gopal [1] - only CLUSTER BY needed, not ORDER BY