Prasanth Jayachandran created HIVE-9188:
-------------------------------------------

             Summary: BloomFilter in ORC row group index
                 Key: HIVE-9188
                 URL: https://issues.apache.org/jira/browse/HIVE-9188
             Project: Hive
          Issue Type: New Feature
          Components: File Formats
    Affects Versions: 0.15.0
            Reporter: Prasanth Jayachandran
            Assignee: Prasanth Jayachandran


BloomFilters are well known probabilistic data structure for set membership 
checking. We can use bloom filters in ORC index for better row group pruning. 
Currently, ORC row group index uses min/max statistics to eliminate row groups 
(stripes as well) that do not satisfy predicate condition specified in the 
query. But in some cases, the efficiency of min/max based elimination is not 
optimal (unsorted columns with wide range of entries). Bloom filters can be an 
effective and efficient alternative for row group/split elimination for point 
queries or queries with IN clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to