Thanks Xiaoxiang for raising this discussion. From my point of view, it’s time for our Kylin community to take a big step to make Kylin great again. I totally agree with Xiaoxiang’s proposal of adopting the new codebase of the branch https://github.com/apache/kylin/tree/kylin5, which is mainly contributed by the Kyligence.
The Kylin team inside eBay also has been working on that branch together with the contributors from Kyligence for nearly half a year. And we just made an internal alpha release based on that branch and get lots of positive feedbacks from our trial users. The main breakthrough of Kylin5 is the change from the cube-based to the index-based. Therefore, now we can make our focus on how to make Kylin as an excellent index management system. Compared to the Kylin4, the current Kylin5 has already owned the following advantages: * Low barrier of creating a new index model. Indexes are optional and can be adjusted iteratively. * Easy and transparent index model upgrade. Dimension and measure change on existing index model can be with no downtime. * Raw data query can be supported by a new kind of index, projection index. * No limitation of the dimension number for the aggregation index (previous cuboid). At the meanwhile, the code of Kylin5 also has been refactored so that it makes it’s much easier to introduce different execution engines for querying on Kylin indexes. The Kylin team inside eBay has been working on a native vectorized execution engine (DataFusion+Ballista) with the Apache Arrow community for more than one year. And it’s used in our internal alpha release. We get an incredibly excellent result. Benchmarking on the SSB testing data set(1TB, 6Billion rows), our version can have average 20 times query performance gain compared to the Kylin3. [cid:image001.png@01D94264.0FC17600] Also thanks George for raising your concerns. Actually, the eBay Kylin also has the similar issues as we are stilling using Kylin3. We are mainly facing 3 kinds of challenges: 1. How to deal with incompatibility of interfaces between different Kylin versions? 2. How to make the iteration of the index model update more intelligently and transparently for users? 3. How to migrate Kylin3’s metadata and data to Kylin5? For the first challenge, the eBay Kylin team has been working on designing and implementing Kylin’s own SQL grammar for all of the DDL, DML, DQL and other commands. The SQL grammar should be standard, easy understanding. All of Kylin future versions should follow that SQL grammar so that users will be able to use the same SQLs for different versions of Kylin and don’t need to worry about the incompatibility issue. We have finished an initial version and now it’s under testing and verification. And we will raise issues and PRs to the community soon. For the second challenge, the eBay Kylin team has been working on reimplementation a new two-phase index planner for Kylin5 based on the Kylin3’s two-phase cube planner. The basic idea is as follows: 1. Do index recommendation based on data, like the row count of aggregation index 2. Do index recommendation based on user behavior, like query statistics The whole iteration process of index model will be user transparent. Currently, we have proposed PR for the phase one and it’s under review. For the third challenge, the eBay Kylin team has created tools for metadata migration from Kylin3 to Kylin5. And we will raise PR to the community soon. Overall, I will give +1 for this proposal of making Kylin5 as the codebase for Apache Kylin. Best regards, Yanghong Zhong 钟阳红 Apache Kylin Committer, PMC, Apache Arrow Committer, Email: nju_y...@apache.org From: hit_la...@126.com <hit_la...@126.com> on behalf of Xiaoxiang Yu <x...@apache.org> Date: Thursday, February 16, 2023 at 14:29 To: dev@kylin.apache.org <dev@kylin.apache.org> Subject: Re: [Discuss] Adopt the new codebase as Apache Kylin 5.0 External Email 1. Would it possible for users of Kylin 2-4 to upgrade their metadata to Kylin5 easily? I was talking with some early users of new codebase, they told me that they have a plan to upgrade to new codebase(kylin 5) from kylin 3, and they plan to developed and contributed the metadata upgradation tools. So I think this issue will be solved soon. 2. Would the structures(URL, request and response) of the Restiful Apis in Kylin 2-4 be kept? It is a good question, but I have to say most REST APIs have been rewritten so they are call in new way. I think new REST doc will help to solve this partially. 3. Any benchmark test has been done? I think I will do a benchmark in next month. 4. Are features like Realtime Cubing, Cube Planner in Kylin3 are included in Kylin5? Kafka streaming/Realtime cubing/JDBC source are implemented in new way so previous code are not exists. For cube planner, Liu Kun are trying hard to implement it in new codebase (see https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fkylin%2Fpull%2F2089&data=05%7C01%7Cyangzhong%40ebay.com%7Cdbea46d85a3241ecd41908db0fe7184f%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C638121257441815381%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zt8XWDqZZhAq0OloJtSOL0Li0e7RxONBsl5wMRf1wR4%3D&reserved=0 ). -- Best wishes to you ! From :Xiaoxiang Yu At 2023-02-15 21:03:40, "George Ni" <n...@apache.org> wrote: >Hi, > >Overrall, I'd like to give +1 to this proposal, for Kylin5 has implemented >such a lot significant breakthroughs. Below are some of my questions: > >1. Would it possible for users of Kylin 2-4 to upgrade their metadata to >Kylin5 easily? >2. Would the structures(URL, request and response) of the Restiful Apis in >Kylin 2-4 be kept? >3. Any benchmark test has been done? >4. Are features like Realtime Cubing, Cube Planner in Kylin3 are included >in Kylin5? > >Li Yang <liy...@apache.org> 于2023年2月15日周三 17:23写道: > >> As Xiaoxiang mentioned, the code donation has a lot of improvements >> compared to current Kylin 4. Many are long wanted, like >> >> - The flexible model can greatly improve the smoothness of adding new >> dimensions in a production environment. >> - The computed column can mind the gap of last-mile data transformation. >> - The new model metadata design that is more friendly to dynamic >> indexing. >> - Support of 63+ dimensions. >> >> Accepting this code base a good thing for the whole Kylin community. >> >> Cheers >> Yang >> >> >> On Tue, Feb 14, 2023 at 10:46 PM ShaoFeng Shi <shaofeng...@apache.org> >> wrote: >> >> > The current limitations are very difficult to solve in normal ways. For >> > example, the Cuboid ID is represented by a Long number, which is 64 bit, >> > and the sequence of each dimension is fixed. The Cuboid ID appears in >> every >> > part of Kylin's source code. This design couldn't be refactored easily. >> So >> > I agree that a whole new design is necessary, in long term it can help a >> > lot. >> > >> > Best regards, >> > >> > Shaofeng Shi 史少锋 >> > Apache Kylin PMC, >> > Apache Incubator PMC, >> > Email: shaofeng...@apache.org >> > >> > Apache Kylin FAQ: >> > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkylin.apache.org%2Fdocs%2Fgettingstarted%2Ffaq.html&data=05%7C01%7Cyangzhong%40ebay.com%7Cdbea46d85a3241ecd41908db0fe7184f%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C638121257441815381%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2TEMZKFchgxgPYRDHEwRVRFi%2BAvC0LRdqrW8wxVLbcM%3D&reserved=0 >> > Join Kylin user mail group: user-subscr...@kylin.apache.org >> > Join Kylin dev mail group: dev-subscr...@kylin.apache.org >> > >> > >> > >> > >> > Xiaoxiang Yu <x...@apache.org> 于2023年2月14日周二 14:22写道: >> > >> > > A formatted version of the discussion with the same content: >> > > >> > > ## Background ## >> > > >> > > As we discussed in the mailing list[2] last year, Kylin 4.0 has >> achieved >> > > its goal in new storage (columnar file) and new query engine (Spark >> > based), >> > > and gained some adoptions from the community. But due to the old design >> > > from the early versions, Kylin 4.0 still keep some limitations from >> > > previous versions, such as max. 63 dimension cap, cube structure >> couldn't >> > > be modified once built, etc. We think the only way to solve those >> > > limitations is to do a whole redesign, especially in the metadata. >> > > >> > > The good news is, Kyligence has started to do that from years ago, and >> > its >> > > comercial version has been verified by many customers in terms of its >> > > functionality, performance and stability. Last year, Kyligence open >> > sourced >> > > its core under Apache License v2.0, and signed CCLA to Apache Software >> > > Foundataion. We staged it in a separate branch of the github repository >> > for >> > > review[1]. Engineers from other teams such as eBay also reviewed the >> > > codebase, and put forward many new ideas. We think based on the >> codebase, >> > > Kylin will not only gain a flexible metadata design, a faster computing >> > > engine, but also will gain richer user scenarios. >> > > >> > > The new codebase has the following features compared with the latest >> > > release (Kylin 4.0.3): >> > > >> > > - More flexible and enhanced data model >> > > * Allow adding new dimensions and measures to the existing data >> model >> > > * The model adapts to table schema changes while retaining the >> > > existing index at the best effort >> > > * Support last-mile data transformation using Computed Column >> > > * Support raw query (non-aggregation query) using Table Index >> > > * Support changing dimension table (SCD2) >> > > - Simplified metadata design >> > > * Merge DataModel and CubeDesc into new DataModel >> > > * Add DataFlow for more generic data sequence, e.g. streaming like >> > > data flow >> > > * New metadata AuditLog for better cache synchronization >> > > - More flexible index management >> > > * Add IndexPlan to support flexible index management >> > > * Add IndexEntity to support different index type >> > > * Add LayoutEntity to support different storage layouts of the same >> > > Index >> > > - Toward a native and vectorized query engine >> > > * Experiment: Integrate with a native execution engine, leveraging >> > > Gluten >> > > * Support async query >> > > * Enhance cost-based index optimizer >> > > - More >> > > * Build engine refactoring and performance optimization >> > > * New WEB UI based on Vue.js, a brand new front-end framework, to >> > > replace AngularJS >> > > * Smooth modeling process on one canvas >> > > >> > > >> > > >> > > >> > > ## Proposal ## >> > > So, I'd like to propose adopting the new codebase from Kyligence as >> Kylin >> > > 's future code base, e.g, Kylin 5. If accepted, we will request an IP >> > > clearance in Apache Incubator for it as the next step. >> > > >> > > >> > > >> > > >> > > >> > > ## Reference ## >> > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fkylin%2Ftree%2Fkylin5&data=05%7C01%7Cyangzhong%40ebay.com%7Cdbea46d85a3241ecd41908db0fe7184f%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C638121257441815381%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5qK7nh9cKPo%2FREGha5c4UVlLDwIC6EjQSPgAXtM3pjk%3D&reserved=0 >> > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread%2F4fkhyw1fyf0jg5cb18v7vxyqbn6vm3zv&data=05%7C01%7Cyangzhong%40ebay.com%7Cdbea46d85a3241ecd41908db0fe7184f%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C638121257441815381%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XPg6oVIOdOnsVUF0RcIa2F9G2ZmMTBy6LhtuWt9GsGM%3D&reserved=0 >> > > >> > > >> > > -- >> > > >> > > Best wishes to you ! >> > > From :Xiaoxiang Yu >> > > >> > > >> > > >> > > >> > > >> > > At 2023-02-14 14:09:31, "Xiaoxiang Yu" <x...@apache.org> wrote: >> > > >Background >> > > > >> > > > >> > > >As we discussed in the mailing list[2] last year, Kylin 4.0 has >> achieved >> > > its goal in new storage (columnar file) and new query engine (Spark >> > based), >> > > and gained some adoptions from the community. But due to the old design >> > > from the early versions, Kylin 4.0 still keep some limitations from >> > > previous versions, such as max. 63 dimension cap, cube structure >> couldn't >> > > be modified once built, etc. We think the only way to solve those >> > > limitations is to do a whole redesign, especially in the metadata. >> > > > >> > > > >> > > >The good news is, Kyligence has started to do that from years ago, and >> > > its comercial version has been verified by many customers in terms of >> its >> > > functionality, performance and stability. Last year, Kyligence open >> > sourced >> > > its core under Apache License v2.0, and signed CCLA to Apache Software >> > > Foundataion. We staged it in a separate branch of the github repository >> > for >> > > review[1]. Engineers from other teams such as eBay also reviewed the >> > > codebase, and put forward many new ideas. We think based on the >> codebase, >> > > Kylin will not only gain a flexible metadata design, a faster computing >> > > engine, but also will gain richer user scenarios. >> > > > >> > > > >> > > >The new codebase has the following features compared with the latest >> > > release (Kylin 4.0.3): >> > > >More flexible and enhanced data model >> > > >Allow adding new dimensions and measures to the existing data model >> > > >The model adapts to table schema changes while retaining the existing >> > > index at the best effort >> > > >Support last-mile data transformation using Computed Column >> > > >Support raw query (non-aggregation query) using Table Index >> > > >Support changing dimension table (SCD2) >> > > >Simplified metadata design >> > > >Merge DataModel and CubeDesc into new DataModel >> > > >Add DataFlow for more generic data sequence, e.g. streaming like data >> > flow >> > > >New metadata AuditLog for better cache synchronization >> > > >More flexible index management >> > > >Add IndexPlan to support flexible index management >> > > >Add IndexEntity to support different index type >> > > >Add LayoutEntity to support different storage layouts of the same >> Index >> > > >Toward a native and vectorized query engine >> > > >Experiment: Integrate with a native execution engine, leveraging >> Gluten >> > > >Support async query >> > > >Enhance cost-based index optimizer >> > > >More >> > > >Build engine refactoring and performance optimization >> > > >New WEB UI based on Vue.js, a brand new front-end framework, to >> replace >> > > AngularJS >> > > >Smooth modeling process on one canvas >> > > >Proposal >> > > >So, I'd like to propose adopting the new codebase from Kyligence as >> > Kylin >> > > 's future code base, e.g, Kylin 5. If accepted, we will request an IP >> > > clearance in Apache Incubator for it as the next step. >> > > >Reference >> > > >https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fkylin%2Ftree%2Fkylin5&data=05%7C01%7Cyangzhong%40ebay.com%7Cdbea46d85a3241ecd41908db0fe7184f%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C638121257441815381%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5qK7nh9cKPo%2FREGha5c4UVlLDwIC6EjQSPgAXtM3pjk%3D&reserved=0 >> > > >https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread%2F4fkhyw1fyf0jg5cb18v7vxyqbn6vm3zv&data=05%7C01%7Cyangzhong%40ebay.com%7Cdbea46d85a3241ecd41908db0fe7184f%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C638121257441815381%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XPg6oVIOdOnsVUF0RcIa2F9G2ZmMTBy6LhtuWt9GsGM%3D&reserved=0 >> > > >https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkylin.apache.org%2F5.0%2Fblog%2Fintroduction_of_metastore_cn&data=05%7C01%7Cyangzhong%40ebay.com%7Cdbea46d85a3241ecd41908db0fe7184f%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C638121257441815381%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SXNzZ9l1%2Bpeb5FkjBX8Xry9ItgSumjBLJOOlrJm0WZc%3D&reserved=0 >> > > > >> > > >-- >> > > > >> > > >Best wishes to you ! >> > > >From :Xiaoxiang Yu >> > >> > > >-- > >--------------------- > >Best regards, > > > >Ni Chunen / George