Re:[Discuss] Adopt the new codebase as Apache Kylin 5.0

Xiaoxiang Yu Mon, 13 Feb 2023 22:22:08 -0800

A formatted version of the discussion with the same content:

## Background ##


As we discussed in the mailing list[2] last year, Kylin 4.0 has achieved its 
goal in new storage (columnar file) and new query engine (Spark based), and 
gained some adoptions from the community. But due to the old design from the 
early versions, Kylin 4.0 still keep some limitations from previous versions, 
such as max. 63 dimension cap, cube structure couldn't be modified once built, 
etc. We think the only way to solve those limitations is to do a whole 
redesign, especially in the metadata.

The good news is, Kyligence has started to do that from years ago, and its 
comercial version has been verified by many customers in terms of its 
functionality, performance and stability. Last year, Kyligence open sourced its 
core under Apache License v2.0, and signed CCLA to Apache Software Foundataion. 
We staged it in a separate branch of the github repository for review[1]. 
Engineers from other teams such as eBay also reviewed the codebase, and put 
forward many new ideas. We think based on the codebase, Kylin will not only 
gain a flexible metadata design, a faster computing engine, but also will gain 
richer user scenarios.

The new codebase has the following features compared with the latest release 
(Kylin 4.0.3):

- More flexible and enhanced data model
    * Allow adding new dimensions and measures to the existing data model
    * The model adapts to table schema changes while retaining the existing 
index at the best effort
    * Support last-mile data transformation using Computed Column
    * Support raw query (non-aggregation query) using Table Index
    * Support changing dimension table (SCD2)
- Simplified metadata design
    * Merge DataModel and CubeDesc into new DataModel
    * Add DataFlow for more generic data sequence, e.g. streaming like data flow
    * New metadata AuditLog for better cache synchronization
- More flexible index management
    * Add IndexPlan to support flexible index management
    * Add IndexEntity to support different index type
    * Add LayoutEntity to support different storage layouts of the same Index
- Toward a native and vectorized query engine
    * Experiment: Integrate with a native execution engine, leveraging Gluten
    * Support async query
    * Enhance cost-based index optimizer
- More
    * Build engine refactoring and performance optimization
    * New WEB UI based on Vue.js, a brand new front-end framework, to replace 
AngularJS
    * Smooth modeling process on one canvas




## Proposal ##
So, I'd like to propose adopting the new codebase from Kyligence as Kylin 's 
future code base, e.g, Kylin 5. If accepted, we will request an IP clearance in 
Apache Incubator for it as the next step.





## Reference ##
https://github.com/apache/kylin/tree/kylin5
https://lists.apache.org/thread/4fkhyw1fyf0jg5cb18v7vxyqbn6vm3zv


--

Best wishes to you ! 
From ：Xiaoxiang Yu





At 2023-02-14 14:09:31, "Xiaoxiang Yu" <x...@apache.org> wrote:
>Background
>
>
>As we discussed in the mailing list[2] last year, Kylin 4.0 has achieved its 
>goal in new storage (columnar file) and new query engine (Spark based), and 
>gained some adoptions from the community. But due to the old design from the 
>early versions, Kylin 4.0 still keep some limitations from previous versions, 
>such as max. 63 dimension cap, cube structure couldn't be modified once built, 
>etc. We think the only way to solve those limitations is to do a whole 
>redesign, especially in the metadata.
>
>
>The good news is, Kyligence has started to do that from years ago, and its 
>comercial version has been verified by many customers in terms of its 
>functionality, performance and stability. Last year, Kyligence open sourced 
>its core under Apache License v2.0, and signed CCLA to Apache Software 
>Foundataion. We staged it in a separate branch of the github repository for 
>review[1]. Engineers from other teams such as eBay also reviewed the codebase, 
>and put forward many new ideas. We think based on the codebase, Kylin will not 
>only gain a flexible metadata design, a faster computing engine, but also will 
>gain richer user scenarios.
>
>
>The new codebase has the following features compared with the latest release 
>(Kylin 4.0.3):
>More flexible and enhanced data model
>Allow adding new dimensions and measures to the existing data model
>The model adapts to table schema changes while retaining the existing index at 
>the best effort
>Support last-mile data transformation using Computed Column
>Support raw query (non-aggregation query) using Table Index
>Support changing dimension table (SCD2)
>Simplified metadata design
>Merge DataModel and CubeDesc into new DataModel
>Add DataFlow for more generic data sequence, e.g. streaming like data flow
>New metadata AuditLog for better cache synchronization
>More flexible index management
>Add IndexPlan to support flexible index management
>Add IndexEntity to support different index type
>Add LayoutEntity to support different storage layouts of the same Index
>Toward a native and vectorized query engine
>Experiment: Integrate with a native execution engine, leveraging Gluten
>Support async query
>Enhance cost-based index optimizer
>More
>Build engine refactoring and performance optimization
>New WEB UI based on Vue.js, a brand new front-end framework, to replace 
>AngularJS
>Smooth modeling process on one canvas
>Proposal
>So, I'd like to propose adopting the new codebase from Kyligence as Kylin 's 
>future code base, e.g, Kylin 5. If accepted, we will request an IP clearance 
>in Apache Incubator for it as the next step.
>Reference
>https://github.com/apache/kylin/tree/kylin5
>https://lists.apache.org/thread/4fkhyw1fyf0jg5cb18v7vxyqbn6vm3zv
>https://kylin.apache.org/5.0/blog/introduction_of_metastore_cn
>
>--
>
>Best wishes to you ! 
>From ：Xiaoxiang Yu

Re:[Discuss] Adopt the new codebase as Apache Kylin 5.0

Reply via email to