[jira] [Created] (KYLIN-5564) Introduce Bloom Filter to optimize data scanning based on Spark

Guangyuan Feng (Jira) Wed, 07 Jun 2023 02:29:07 -0700

Guangyuan Feng created KYLIN-5564:
-------------------------------------

             Summary: Introduce Bloom Filter to optimize data scanning based on 
Spark
                 Key: KYLIN-5564
                 URL: https://issues.apache.org/jira/browse/KYLIN-5564
             Project: Kylin
          Issue Type: Improvement
          Components: Query Engine
    Affects Versions: 5.0-alpha
            Reporter: Guangyuan Feng
            Assignee: Guangyuan Feng
             Fix For: 5.0-alpha



Currently, all the data generated by Kylin are saved as Parquet files through 
Spark, but Kylin has not make full use of the features of Parquet when scanning 
data. Among them, BloomFilter must be stressed, because it's the most common 
tool to help READERs to skip useless data.

Therefore, we introduced a approach to build BloomFilter automatically, 
conditionally and smartly when constructing segments, on the desired columns 
especially according to the query histories.

After brought in BloomFilter, Spark will have a good performance improvement in 
the most cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KYLIN-5564) Introduce Bloom Filter to optimize data scanning based on Spark

Reply via email to