[ https://issues.apache.org/jira/browse/KUDU-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124250#comment-17124250 ]
Bankim Bhavsar commented on KUDU-3140: -------------------------------------- HDFS scanner for Parquet maintains a per predicate stat. For every 16 blocks, it checks the effectiveness of the filter and if the rejection ration is less than 10%(by default) then the filter is disabled. Code pointers: https://github.com/apache/impala/blob/master/be/src/exec/hdfs-scanner.cc#L775 https://github.com/apache/impala/blob/master/be/src/exec/hdfs-scanner.h#L138 https://github.com/apache/impala/blob/master/be/src/exec/hdfs-scanner-ir.cc#L102 > Add heuristics to disable predicate evaluation/filtering for Bloom filter > predicate > ----------------------------------------------------------------------------------- > > Key: KUDU-3140 > URL: https://issues.apache.org/jira/browse/KUDU-3140 > Project: Kudu > Issue Type: Improvement > Components: perf, util > Affects Versions: 1.12.0 > Reporter: Bankim Bhavsar > Assignee: Bankim Bhavsar > Priority: Major > > KUDU-2483 introduced support for Bloom filter predicate. > However as observed with TPCH, query 9 exhibits regression when Bloom filter > predicates are pushed down to kudu. > See excerpt from performance analysis of TPCH run by [~wzhou]. > https://gist.github.com/bbhavsar/811ccbe0cd144090f82bdabcd801f827 -- This message was sent by Atlassian Jira (v8.3.4#803005)