Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid in my team.
I found this article and would like you to update me the advantages of Kylin since 2018 until now (especially with version 5 to be released) Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)? <https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote: > Thank you very much for your prompt response, I still have several > questions to seek for your help later. > > Best regards and have a good day > > > > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <x...@apache.org> wrote: > >> Done. Github branch changed to kylin5. >> >> ------------------------ >> With warm regard >> Xiaoxiang Yu >> >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <x...@apache.org> wrote: >> >> > A JIRA ticket has been opened, waiting for INFRA : >> > https://issues.apache.org/jira/browse/INFRA-25238 . >> > ------------------------ >> > With warm regard >> > Xiaoxiang Yu >> > >> > >> > >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> >> wrote: >> > >> >> Thank you Xiaoxiang, please update me when you have changed your >> default >> >> branch. In case people are impressed by the numbers then I hope to turn >> >> this situation to reverse direction. >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <x...@apache.org> wrote: >> >> >> >>> The default branch is for 4.X which is a maintained branch, the active >> >>> branch is kylin5. >> >>> I will change the default branch to kylin5 later. >> >>> >> >>> ------------------------ >> >>> With warm regard >> >>> Xiaoxiang Yu >> >>> >> >>> >> >>> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> >> >>> wrote: >> >>> >> >>>> Hi Xiaoxiang, Sirs / Madams >> >>>> >> >>>> Can you see the atttached photo >> >>>> >> >>>> My boss asked that why druid commit code regularly but kylin had not >> >>>> been committed since July >> >>>> >> >>>> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <x...@apache.org> wrote: >> >>>> >> >>>>> I think so. >> >>>>> >> >>>>> Response time is not the only factor to make a decision. Kylin could >> >>>>> be cheaper >> >>>>> when the query pattern is suitable for the Kylin model, and Kylin >> can >> >>>>> guarantee >> >>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc >> >>>>> query scenario. >> >>>>> >> >>>>> By the way, Youzan and Kyligence combine them together to provide >> >>>>> unified data analytics services for their customers. >> >>>>> >> >>>>> ------------------------ >> >>>>> With warm regard >> >>>>> Xiaoxiang Yu >> >>>>> >> >>>>> >> >>>>> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >> >>>>> wrote: >> >>>>> >> >>>>>> Hi Xiaoxiang, thank you >> >>>>>> >> >>>>>> In case my client uses cloud computing service like gcp or aws, >> which >> >>>>>> will cost more: precalculation feature of kylin or clickhouse >> (incase >> >>>>>> of >> >>>>>> kylin, I have a thought that the query execution has been done once >> >>>>>> and >> >>>>>> stored in cube to be used many times so kylin uses less cloud >> >>>>>> computation, >> >>>>>> is that true)? >> >>>>>> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <x...@apache.org> >> wrote: >> >>>>>> >> >>>>>> > Following text is part of an article( >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) . >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> >> =============================================================================== >> >>>>>> > >> >>>>>> > Kylin is suitable for aggregation queries with fixed modes >> because >> >>>>>> of its >> >>>>>> > pre-calculated technology, for example, join, group by, and where >> >>>>>> condition >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data >> volume >> >>>>>> is, the >> >>>>>> > more obvious the advantages of using Kylin are; in particular, >> >>>>>> Kylin is >> >>>>>> > particularly advantageous in the scenarios of de-emphasis (count >> >>>>>> distinct), >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in >> >>>>>> de-weighting >> >>>>>> > (count distinct), Top N, Percentile and other scenarios are >> >>>>>> especially >> >>>>>> > huge, and it is used in a large number of scenarios, such as >> >>>>>> Dashboard, all >> >>>>>> > kinds of reports, large-screen display, traffic statistics, and >> user >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin >> >>>>>> to build >> >>>>>> > their data service platforms, providing millions to tens of >> >>>>>> millions of >> >>>>>> > queries per day, and most of the queries can be completed within >> 2 >> >>>>>> - 3 >> >>>>>> > seconds. There is no better alternative for such a high >> concurrency >> >>>>>> > scenario. >> >>>>>> > >> >>>>>> > ClickHouse, because of its MPP architecture, has high computing >> >>>>>> power and >> >>>>>> > is more suitable when the query request is more flexible, or when >> >>>>>> there is >> >>>>>> > a need for detailed queries with low concurrency. Scenarios >> >>>>>> include: very >> >>>>>> > many columns and where conditions are arbitrarily combined with >> the >> >>>>>> user >> >>>>>> > label filtering, not a large amount of concurrency of complex >> >>>>>> on-the-spot >> >>>>>> > query and so on. If the amount of data and access is large, you >> >>>>>> need to >> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher >> >>>>>> challenge for >> >>>>>> > operation and maintenance. >> >>>>>> > >> >>>>>> > If some queries are very flexible but infrequent, it is more >> >>>>>> > resource-efficient to use now-computing. Since the number of >> >>>>>> queries is >> >>>>>> > small, even if each query consumes a lot of computational >> >>>>>> resources, it is >> >>>>>> > still cost-effective overall. If some queries have a fixed >> pattern >> >>>>>> and the >> >>>>>> > query volume is large, it is more suitable for Kylin, because the >> >>>>>> query >> >>>>>> > volume is large, and by using large computational resources to >> save >> >>>>>> the >> >>>>>> > results, the upfront computational cost can be amortized over >> each >> >>>>>> query, >> >>>>>> > so it is the most economical. >> >>>>>> > >> >>>>>> > --- Translated with DeepL.com (free version) >> >>>>>> > >> >>>>>> > >> >>>>>> > ------------------------ >> >>>>>> > With warm regard >> >>>>>> > Xiaoxiang Yu >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid >> > >> >>>>>> wrote: >> >>>>>> > >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming feature. >> >>>>>> That's >> >>>>>> >> great. >> >>>>>> >> >> >>>>>> >> This morning there has been a new challenge to my team: >> clickhouse >> >>>>>> offered >> >>>>>> >> us the speed of calculating 8 billion rows in millisecond which >> is >> >>>>>> faster >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1 billion >> >>>>>> rows in >> >>>>>> >> 2.9 >> >>>>>> >> seconds) >> >>>>>> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over clickhouse >> so >> >>>>>> that I >> >>>>>> >> can defend my demonstration. >> >>>>>> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <x...@apache.org> >> >>>>>> wrote: >> >>>>>> >> >> >>>>>> >> > 1. "In this important scenario of realtime analytics, the >> reason >> >>>>>> here is >> >>>>>> >> > that >> >>>>>> >> > kylin has lag time due to model update of new segment build, >> is >> >>>>>> that >> >>>>>> >> > correct?" >> >>>>>> >> > >> >>>>>> >> > You are correct. >> >>>>>> >> > >> >>>>>> >> > 2. "If that is true, then can you suggest a work-around of >> >>>>>> combination >> >>>>>> >> of >> >>>>>> >> > ... " >> >>>>>> >> > >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is >> completed >> >>>>>> but not >> >>>>>> >> > released), >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my >> >>>>>> estimation >> >>>>>> >> but I >> >>>>>> >> > am >> >>>>>> >> > quite certain about it). >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do >> >>>>>> micro-batch >> >>>>>> >> > aggregation and persistence periodically. The price is that >> you >> >>>>>> need to >> >>>>>> >> run >> >>>>>> >> > and monitor a long-running >> >>>>>> >> > job. This feature is based on Spark Streaming, so you need >> >>>>>> knowledge of >> >>>>>> >> > it. >> >>>>>> >> > >> >>>>>> >> > I am curious about what is the maximum time-lag your customers >> >>>>>> >> > can tolerate? >> >>>>>> >> > Personally, I guess minute level time-lag is ok for most >> cases. >> >>>>>> >> > >> >>>>>> >> > ------------------------ >> >>>>>> >> > With warm regard >> >>>>>> >> > Xiaoxiang Yu >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy >> >>>>>> <na...@vnpay.vn.invalid> >> >>>>>> >> wrote: >> >>>>>> >> > >> >>>>>> >> > > Druid is better in >> >>>>>> >> > > - Have a real-time datasource like Kafka etc. >> >>>>>> >> > > >> >>>>>> >> > > ========================== >> >>>>>> >> > > >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response. >> >>>>>> >> > > >> >>>>>> >> > > In this important scenario of realtime alalytics, the reason >> >>>>>> here is >> >>>>>> >> that >> >>>>>> >> > > kylin has lag time due to model update of new segment build, >> >>>>>> is that >> >>>>>> >> > > correct? >> >>>>>> >> > > >> >>>>>> >> > > If that is true, then can you suggest a work-around of >> >>>>>> combination of >> >>>>>> >> : >> >>>>>> >> > > >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide >> >>>>>> >> > > realtime capability ? >> >>>>>> >> > > >> >>>>>> >> > > IMO, the point here is to find that (realtime DB update) and >> >>>>>> >> integrate it >> >>>>>> >> > > with (time - lag kylin cube). >> >>>>>> >> > > >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu < >> x...@apache.org> >> >>>>>> wrote: >> >>>>>> >> > > >> >>>>>> >> > > > I researched and tested Druid two years ago(I don't know >> too >> >>>>>> much >> >>>>>> >> about >> >>>>>> >> > > > the change of Druid in these two years. New features >> that I >> >>>>>> know >> >>>>>> >> are : >> >>>>>> >> > > > new UI, fully on K8s etc). >> >>>>>> >> > > > >> >>>>>> >> > > > Here are some cases you should consider using Druid other >> >>>>>> than Kylin >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid >> >>>>>> which I >> >>>>>> >> used >> >>>>>> >> > two >> >>>>>> >> > > > years ago): >> >>>>>> >> > > > >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc. >> >>>>>> >> > > > - Most queries are small(Based on my test result, I think >> >>>>>> Druid had >> >>>>>> >> > > better >> >>>>>> >> > > > response time for small queries two years ago.) >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the >> >>>>>> >> K8S/public >> >>>>>> >> > > > cloud platform as your deployment platform. >> >>>>>> >> > > > >> >>>>>> >> > > > But I do think there are many scenarios in which Kylin >> could >> >>>>>> be >> >>>>>> >> better, >> >>>>>> >> > > > like: >> >>>>>> >> > > > >> >>>>>> >> > > > - Better performance for complex/big queries. Kylin can >> have >> >>>>>> a more >> >>>>>> >> > > > exact-match/fine-grained >> >>>>>> >> > > > Index for queries containing different `Group By >> >>>>>> dimensions`. >> >>>>>> >> > > > - User-friendly UI for modeling. >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment) >> >>>>>> >> > > > - ODBC driver for different BI.(its website did not show >> it >> >>>>>> supports >> >>>>>> >> > ODBC >> >>>>>> >> > > > well) >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid. >> >>>>>> >> > > > >> >>>>>> >> > > > >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about it. >> >>>>>> >> > > > Hope to help you, or you are free to share your opinion. >> >>>>>> >> > > > >> >>>>>> >> > > > ------------------------ >> >>>>>> >> > > > With warm regard >> >>>>>> >> > > > Xiaoxiang Yu >> >>>>>> >> > > > >> >>>>>> >> > > > >> >>>>>> >> > > > >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy >> >>>>>> <na...@vnpay.vn.invalid> >> >>>>>> >> > > wrote: >> >>>>>> >> > > > >> >>>>>> >> > > >> Dear Xiaoxiang, >> >>>>>> >> > > >> Sirs/Madams, >> >>>>>> >> > > >> >> >>>>>> >> > > >> May I post my boss's question: >> >>>>>> >> > > >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin >> >>>>>> compared to >> >>>>>> >> > Pinot >> >>>>>> >> > > >> and >> >>>>>> >> > > >> Druid? >> >>>>>> >> > > >> >> >>>>>> >> > > >> Please kindly let me know >> >>>>>> >> > > >> >> >>>>>> >> > > >> Thank you very much and best regards >> >>>>>> >> > > >> >> >>>>>> >> > > > >> >>>>>> >> > > >> >>>>>> >> > >> >>>>>> >> >> >>>>>> > >> >>>>>> >> >>>>> >> >