Guangyuan Feng created KYLIN-5564: ------------------------------------- Summary: Introduce Bloom Filter to optimize data scanning based on Spark Key: KYLIN-5564 URL: https://issues.apache.org/jira/browse/KYLIN-5564 Project: Kylin Issue Type: Improvement Components: Query Engine Affects Versions: 5.0-alpha Reporter: Guangyuan Feng Assignee: Guangyuan Feng Fix For: 5.0-alpha
Currently, all the data generated by Kylin are saved as Parquet files through Spark, but Kylin has not make full use of the features of Parquet when scanning data. Among them, BloomFilter must be stressed, because it's the most common tool to help READERs to skip useless data. Therefore, we introduced a approach to build BloomFilter automatically, conditionally and smartly when constructing segments, on the desired columns especially according to the query histories. After brought in BloomFilter, Spark will have a good performance improvement in the most cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)