Thank you very much for your prompt response, I still have several questions to seek for your help later.
Best regards and have a good day On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <x...@apache.org> wrote: > Done. Github branch changed to kylin5. > > ------------------------ > With warm regard > Xiaoxiang Yu > > > > On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <x...@apache.org> wrote: > > > A JIRA ticket has been opened, waiting for INFRA : > > https://issues.apache.org/jira/browse/INFRA-25238 . > > ------------------------ > > With warm regard > > Xiaoxiang Yu > > > > > > > > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> > wrote: > > > >> Thank you Xiaoxiang, please update me when you have changed your default > >> branch. In case people are impressed by the numbers then I hope to turn > >> this situation to reverse direction. > >> > >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <x...@apache.org> wrote: > >> > >>> The default branch is for 4.X which is a maintained branch, the active > >>> branch is kylin5. > >>> I will change the default branch to kylin5 later. > >>> > >>> ------------------------ > >>> With warm regard > >>> Xiaoxiang Yu > >>> > >>> > >>> > >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> > >>> wrote: > >>> > >>>> Hi Xiaoxiang, Sirs / Madams > >>>> > >>>> Can you see the atttached photo > >>>> > >>>> My boss asked that why druid commit code regularly but kylin had not > >>>> been committed since July > >>>> > >>>> > >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <x...@apache.org> wrote: > >>>> > >>>>> I think so. > >>>>> > >>>>> Response time is not the only factor to make a decision. Kylin could > >>>>> be cheaper > >>>>> when the query pattern is suitable for the Kylin model, and Kylin can > >>>>> guarantee > >>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc > >>>>> query scenario. > >>>>> > >>>>> By the way, Youzan and Kyligence combine them together to provide > >>>>> unified data analytics services for their customers. > >>>>> > >>>>> ------------------------ > >>>>> With warm regard > >>>>> Xiaoxiang Yu > >>>>> > >>>>> > >>>>> > >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> > >>>>> wrote: > >>>>> > >>>>>> Hi Xiaoxiang, thank you > >>>>>> > >>>>>> In case my client uses cloud computing service like gcp or aws, > which > >>>>>> will cost more: precalculation feature of kylin or clickhouse > (incase > >>>>>> of > >>>>>> kylin, I have a thought that the query execution has been done once > >>>>>> and > >>>>>> stored in cube to be used many times so kylin uses less cloud > >>>>>> computation, > >>>>>> is that true)? > >>>>>> > >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <x...@apache.org> > wrote: > >>>>>> > >>>>>> > Following text is part of an article( > >>>>>> > https://zhuanlan.zhihu.com/p/343394287) . > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > =============================================================================== > >>>>>> > > >>>>>> > Kylin is suitable for aggregation queries with fixed modes because > >>>>>> of its > >>>>>> > pre-calculated technology, for example, join, group by, and where > >>>>>> condition > >>>>>> > modes in SQL are relatively fixed, etc. The larger the data volume > >>>>>> is, the > >>>>>> > more obvious the advantages of using Kylin are; in particular, > >>>>>> Kylin is > >>>>>> > particularly advantageous in the scenarios of de-emphasis (count > >>>>>> distinct), > >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in > >>>>>> de-weighting > >>>>>> > (count distinct), Top N, Percentile and other scenarios are > >>>>>> especially > >>>>>> > huge, and it is used in a large number of scenarios, such as > >>>>>> Dashboard, all > >>>>>> > kinds of reports, large-screen display, traffic statistics, and > user > >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin > >>>>>> to build > >>>>>> > their data service platforms, providing millions to tens of > >>>>>> millions of > >>>>>> > queries per day, and most of the queries can be completed within 2 > >>>>>> - 3 > >>>>>> > seconds. There is no better alternative for such a high > concurrency > >>>>>> > scenario. > >>>>>> > > >>>>>> > ClickHouse, because of its MPP architecture, has high computing > >>>>>> power and > >>>>>> > is more suitable when the query request is more flexible, or when > >>>>>> there is > >>>>>> > a need for detailed queries with low concurrency. Scenarios > >>>>>> include: very > >>>>>> > many columns and where conditions are arbitrarily combined with > the > >>>>>> user > >>>>>> > label filtering, not a large amount of concurrency of complex > >>>>>> on-the-spot > >>>>>> > query and so on. If the amount of data and access is large, you > >>>>>> need to > >>>>>> > deploy a distributed ClickHouse cluster, which is a higher > >>>>>> challenge for > >>>>>> > operation and maintenance. > >>>>>> > > >>>>>> > If some queries are very flexible but infrequent, it is more > >>>>>> > resource-efficient to use now-computing. Since the number of > >>>>>> queries is > >>>>>> > small, even if each query consumes a lot of computational > >>>>>> resources, it is > >>>>>> > still cost-effective overall. If some queries have a fixed pattern > >>>>>> and the > >>>>>> > query volume is large, it is more suitable for Kylin, because the > >>>>>> query > >>>>>> > volume is large, and by using large computational resources to > save > >>>>>> the > >>>>>> > results, the upfront computational cost can be amortized over each > >>>>>> query, > >>>>>> > so it is the most economical. > >>>>>> > > >>>>>> > --- Translated with DeepL.com (free version) > >>>>>> > > >>>>>> > > >>>>>> > ------------------------ > >>>>>> > With warm regard > >>>>>> > Xiaoxiang Yu > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid > > > >>>>>> wrote: > >>>>>> > > >>>>>> >> Thank you Xiaoxiang for the near real time streaming feature. > >>>>>> That's > >>>>>> >> great. > >>>>>> >> > >>>>>> >> This morning there has been a new challenge to my team: > clickhouse > >>>>>> offered > >>>>>> >> us the speed of calculating 8 billion rows in millisecond which > is > >>>>>> faster > >>>>>> >> than my demonstration (I used Kylin to do calculating 1 billion > >>>>>> rows in > >>>>>> >> 2.9 > >>>>>> >> seconds) > >>>>>> >> > >>>>>> >> Can you briefly suggest the advantages of kylin over clickhouse > so > >>>>>> that I > >>>>>> >> can defend my demonstration. > >>>>>> >> > >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <x...@apache.org> > >>>>>> wrote: > >>>>>> >> > >>>>>> >> > 1. "In this important scenario of realtime analytics, the > reason > >>>>>> here is > >>>>>> >> > that > >>>>>> >> > kylin has lag time due to model update of new segment build, is > >>>>>> that > >>>>>> >> > correct?" > >>>>>> >> > > >>>>>> >> > You are correct. > >>>>>> >> > > >>>>>> >> > 2. "If that is true, then can you suggest a work-around of > >>>>>> combination > >>>>>> >> of > >>>>>> >> > ... " > >>>>>> >> > > >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is > completed > >>>>>> but not > >>>>>> >> > released), > >>>>>> >> > which can make the time-lag to about 3 minutes(that is my > >>>>>> estimation > >>>>>> >> but I > >>>>>> >> > am > >>>>>> >> > quite certain about it). > >>>>>> >> > NRT stands for 'near real-time', it will run a job and do > >>>>>> micro-batch > >>>>>> >> > aggregation and persistence periodically. The price is that you > >>>>>> need to > >>>>>> >> run > >>>>>> >> > and monitor a long-running > >>>>>> >> > job. This feature is based on Spark Streaming, so you need > >>>>>> knowledge of > >>>>>> >> > it. > >>>>>> >> > > >>>>>> >> > I am curious about what is the maximum time-lag your customers > >>>>>> >> > can tolerate? > >>>>>> >> > Personally, I guess minute level time-lag is ok for most cases. > >>>>>> >> > > >>>>>> >> > ------------------------ > >>>>>> >> > With warm regard > >>>>>> >> > Xiaoxiang Yu > >>>>>> >> > > >>>>>> >> > > >>>>>> >> > > >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy > >>>>>> <na...@vnpay.vn.invalid> > >>>>>> >> wrote: > >>>>>> >> > > >>>>>> >> > > Druid is better in > >>>>>> >> > > - Have a real-time datasource like Kafka etc. > >>>>>> >> > > > >>>>>> >> > > ========================== > >>>>>> >> > > > >>>>>> >> > > Hi Xiaoxiang, thank you for your response. > >>>>>> >> > > > >>>>>> >> > > In this important scenario of realtime alalytics, the reason > >>>>>> here is > >>>>>> >> that > >>>>>> >> > > kylin has lag time due to model update of new segment build, > >>>>>> is that > >>>>>> >> > > correct? > >>>>>> >> > > > >>>>>> >> > > If that is true, then can you suggest a work-around of > >>>>>> combination of > >>>>>> >> : > >>>>>> >> > > > >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide > >>>>>> >> > > realtime capability ? > >>>>>> >> > > > >>>>>> >> > > IMO, the point here is to find that (realtime DB update) and > >>>>>> >> integrate it > >>>>>> >> > > with (time - lag kylin cube). > >>>>>> >> > > > >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <x...@apache.org > > > >>>>>> wrote: > >>>>>> >> > > > >>>>>> >> > > > I researched and tested Druid two years ago(I don't know > too > >>>>>> much > >>>>>> >> about > >>>>>> >> > > > the change of Druid in these two years. New features that > I > >>>>>> know > >>>>>> >> are : > >>>>>> >> > > > new UI, fully on K8s etc). > >>>>>> >> > > > > >>>>>> >> > > > Here are some cases you should consider using Druid other > >>>>>> than Kylin > >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid > >>>>>> which I > >>>>>> >> used > >>>>>> >> > two > >>>>>> >> > > > years ago): > >>>>>> >> > > > > >>>>>> >> > > > - Have a real-time datasource like Kafka etc. > >>>>>> >> > > > - Most queries are small(Based on my test result, I think > >>>>>> Druid had > >>>>>> >> > > better > >>>>>> >> > > > response time for small queries two years ago.) > >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the > >>>>>> >> K8S/public > >>>>>> >> > > > cloud platform as your deployment platform. > >>>>>> >> > > > > >>>>>> >> > > > But I do think there are many scenarios in which Kylin > could > >>>>>> be > >>>>>> >> better, > >>>>>> >> > > > like: > >>>>>> >> > > > > >>>>>> >> > > > - Better performance for complex/big queries. Kylin can > have > >>>>>> a more > >>>>>> >> > > > exact-match/fine-grained > >>>>>> >> > > > Index for queries containing different `Group By > >>>>>> dimensions`. > >>>>>> >> > > > - User-friendly UI for modeling. > >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment) > >>>>>> >> > > > - ODBC driver for different BI.(its website did not show it > >>>>>> supports > >>>>>> >> > ODBC > >>>>>> >> > > > well) > >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid. > >>>>>> >> > > > > >>>>>> >> > > > > >>>>>> >> > > > I don't know Pinot, so I have nothing to say about it. > >>>>>> >> > > > Hope to help you, or you are free to share your opinion. > >>>>>> >> > > > > >>>>>> >> > > > ------------------------ > >>>>>> >> > > > With warm regard > >>>>>> >> > > > Xiaoxiang Yu > >>>>>> >> > > > > >>>>>> >> > > > > >>>>>> >> > > > > >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy > >>>>>> <na...@vnpay.vn.invalid> > >>>>>> >> > > wrote: > >>>>>> >> > > > > >>>>>> >> > > >> Dear Xiaoxiang, > >>>>>> >> > > >> Sirs/Madams, > >>>>>> >> > > >> > >>>>>> >> > > >> May I post my boss's question: > >>>>>> >> > > >> > >>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin > >>>>>> compared to > >>>>>> >> > Pinot > >>>>>> >> > > >> and > >>>>>> >> > > >> Druid? > >>>>>> >> > > >> > >>>>>> >> > > >> Please kindly let me know > >>>>>> >> > > >> > >>>>>> >> > > >> Thank you very much and best regards > >>>>>> >> > > >> > >>>>>> >> > > > > >>>>>> >> > > > >>>>>> >> > > >>>>>> >> > >>>>>> > > >>>>>> > >>>>> >