Druid quick comparision

Nam Đỗ Duy via user Wed, 06 Dec 2023 02:53:18 -0800

Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid in
my team.


I found this article and would like you to update me the advantages of
Kylin since 2018 until now (especially with version 5 to be released)

Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
<https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/>

On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <[email protected]> wrote:

> Thank you very much for your prompt response, I still have several
> questions to seek for your help later.
>
> Best regards and have a good day
>
>
>
> On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <[email protected]> wrote:
>
>> Done. Github branch changed to kylin5.
>>
>> ------------------------
>> With warm regard
>> Xiaoxiang Yu
>>
>>
>>
>> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <[email protected]> wrote:
>>
>> > A JIRA ticket has been opened, waiting for INFRA :
>> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> > ------------------------
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <[email protected]>
>> wrote:
>> >
>> >> Thank you Xiaoxiang, please update me when you have changed your
>> default
>> >> branch. In case people are impressed by the numbers then I hope to turn
>> >> this situation to reverse direction.
>> >>
>> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <[email protected]> wrote:
>> >>
>> >>> The default branch is for 4.X which is a maintained branch, the active
>> >>> branch is kylin5.
>> >>> I will change the default branch to kylin5 later.
>> >>>
>> >>> ------------------------
>> >>> With warm regard
>> >>> Xiaoxiang Yu
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <[email protected]>
>> >>> wrote:
>> >>>
>> >>>> Hi Xiaoxiang, Sirs / Madams
>> >>>>
>> >>>> Can you see the atttached photo
>> >>>>
>> >>>> My boss asked that why druid commit code regularly but kylin had not
>> >>>> been committed since July
>> >>>>
>> >>>>
>> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <[email protected]> wrote:
>> >>>>
>> >>>>> I think so.
>> >>>>>
>> >>>>> Response time is not the only factor to make a decision. Kylin could
>> >>>>> be cheaper
>> >>>>> when the query pattern is suitable for the Kylin model, and Kylin
>> can
>> >>>>> guarantee
>> >>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc
>> >>>>> query scenario.
>> >>>>>
>> >>>>> By the way, Youzan and Kyligence combine them together to provide
>> >>>>> unified data analytics services for their customers.
>> >>>>>
>> >>>>> ------------------------
>> >>>>> With warm regard
>> >>>>> Xiaoxiang Yu
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <[email protected]>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Hi Xiaoxiang, thank you
>> >>>>>>
>> >>>>>> In case my client uses cloud computing service like gcp or aws,
>> which
>> >>>>>> will cost more: precalculation feature of kylin or clickhouse
>> (incase
>> >>>>>> of
>> >>>>>> kylin, I have a thought that the query execution has been done once
>> >>>>>> and
>> >>>>>> stored in cube to be used many times so kylin uses less cloud
>> >>>>>> computation,
>> >>>>>> is that true)?
>> >>>>>>
>> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <[email protected]>
>> wrote:
>> >>>>>>
>> >>>>>> > Following text is part of an article(
>> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>> >>>>>> >
>> >>>>>> >
>> >>>>>> >
>> >>>>>>
>> ===============================================================================
>> >>>>>> >
>> >>>>>> > Kylin is suitable for aggregation queries with fixed modes
>> because
>> >>>>>> of its
>> >>>>>> > pre-calculated technology, for example, join, group by, and where
>> >>>>>> condition
>> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data
>> volume
>> >>>>>> is, the
>> >>>>>> > more obvious the advantages of using Kylin are; in particular,
>> >>>>>> Kylin is
>> >>>>>> > particularly advantageous in the scenarios of de-emphasis (count
>> >>>>>> distinct),
>> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
>> >>>>>> de-weighting
>> >>>>>> > (count distinct), Top N, Percentile and other scenarios are
>> >>>>>> especially
>> >>>>>> > huge, and it is used in a large number of scenarios, such as
>> >>>>>> Dashboard, all
>> >>>>>> > kinds of reports, large-screen display, traffic statistics, and
>> user
>> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin
>> >>>>>> to build
>> >>>>>> > their data service platforms, providing millions to tens of
>> >>>>>> millions of
>> >>>>>> > queries per day, and most of the queries can be completed within
>> 2
>> >>>>>> - 3
>> >>>>>> > seconds. There is no better alternative for such a high
>> concurrency
>> >>>>>> > scenario.
>> >>>>>> >
>> >>>>>> > ClickHouse, because of its MPP architecture, has high computing
>> >>>>>> power and
>> >>>>>> > is more suitable when the query request is more flexible, or when
>> >>>>>> there is
>> >>>>>> > a need for detailed queries with low concurrency. Scenarios
>> >>>>>> include: very
>> >>>>>> > many columns and where conditions are arbitrarily combined with
>> the
>> >>>>>> user
>> >>>>>> > label filtering, not a large amount of concurrency of complex
>> >>>>>> on-the-spot
>> >>>>>> > query and so on. If the amount of data and access is large, you
>> >>>>>> need to
>> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher
>> >>>>>> challenge for
>> >>>>>> > operation and maintenance.
>> >>>>>> >
>> >>>>>> > If some queries are very flexible but infrequent, it is more
>> >>>>>> > resource-efficient to use now-computing. Since the number of
>> >>>>>> queries is
>> >>>>>> > small, even if each query consumes a lot of computational
>> >>>>>> resources, it is
>> >>>>>> > still cost-effective overall. If some queries have a fixed
>> pattern
>> >>>>>> and the
>> >>>>>> > query volume is large, it is more suitable for Kylin, because the
>> >>>>>> query
>> >>>>>> > volume is large, and by using large computational resources to
>> save
>> >>>>>> the
>> >>>>>> > results, the upfront computational cost can be amortized over
>> each
>> >>>>>> query,
>> >>>>>> > so it is the most economical.
>> >>>>>> >
>> >>>>>> > --- Translated with DeepL.com (free version)
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > ------------------------
>> >>>>>> > With warm regard
>> >>>>>> > Xiaoxiang Yu
>> >>>>>> >
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <[email protected]
>> >
>> >>>>>> wrote:
>> >>>>>> >
>> >>>>>> >> Thank you Xiaoxiang for the near real time streaming feature.
>> >>>>>> That's
>> >>>>>> >> great.
>> >>>>>> >>
>> >>>>>> >> This morning there has been a new challenge to my team:
>> clickhouse
>> >>>>>> offered
>> >>>>>> >> us the speed of calculating 8 billion rows in millisecond which
>> is
>> >>>>>> faster
>> >>>>>> >> than my demonstration (I used Kylin to do calculating 1 billion
>> >>>>>> rows in
>> >>>>>> >> 2.9
>> >>>>>> >> seconds)
>> >>>>>> >>
>> >>>>>> >> Can you briefly suggest the advantages of kylin over clickhouse
>> so
>> >>>>>> that I
>> >>>>>> >> can defend my demonstration.
>> >>>>>> >>
>> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <[email protected]>
>> >>>>>> wrote:
>> >>>>>> >>
>> >>>>>> >> > 1. "In this important scenario of realtime analytics, the
>> reason
>> >>>>>> here is
>> >>>>>> >> > that
>> >>>>>> >> > kylin has lag time due to model update of new segment build,
>> is
>> >>>>>> that
>> >>>>>> >> > correct?"
>> >>>>>> >> >
>> >>>>>> >> > You are correct.
>> >>>>>> >> >
>> >>>>>> >> > 2. "If that is true, then can you suggest a work-around of
>> >>>>>> combination
>> >>>>>> >> of
>> >>>>>> >> > ... "
>> >>>>>> >> >
>> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
>> completed
>> >>>>>> but not
>> >>>>>> >> > released),
>> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my
>> >>>>>> estimation
>> >>>>>> >> but I
>> >>>>>> >> > am
>> >>>>>> >> > quite certain about it).
>> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do
>> >>>>>> micro-batch
>> >>>>>> >> > aggregation and persistence periodically. The price is that
>> you
>> >>>>>> need to
>> >>>>>> >> run
>> >>>>>> >> > and monitor a long-running
>> >>>>>> >> >  job. This feature is based on Spark Streaming, so you need
>> >>>>>> knowledge of
>> >>>>>> >> > it.
>> >>>>>> >> >
>> >>>>>> >> > I am curious about what is the maximum time-lag your customers
>> >>>>>> >> > can tolerate?
>> >>>>>> >> > Personally, I guess minute level time-lag is ok for most
>> cases.
>> >>>>>> >> >
>> >>>>>> >> > ------------------------
>> >>>>>> >> > With warm regard
>> >>>>>> >> > Xiaoxiang Yu
>> >>>>>> >> >
>> >>>>>> >> >
>> >>>>>> >> >
>> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
>> >>>>>> <[email protected]>
>> >>>>>> >> wrote:
>> >>>>>> >> >
>> >>>>>> >> > > Druid is better in
>> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
>> >>>>>> >> > >
>> >>>>>> >> > > ==========================
>> >>>>>> >> > >
>> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>> >>>>>> >> > >
>> >>>>>> >> > > In this important scenario of realtime alalytics, the reason
>> >>>>>> here is
>> >>>>>> >> that
>> >>>>>> >> > > kylin has lag time due to model update of new segment build,
>> >>>>>> is that
>> >>>>>> >> > > correct?
>> >>>>>> >> > >
>> >>>>>> >> > > If that is true, then can you suggest a work-around of
>> >>>>>> combination of
>> >>>>>> >> :
>> >>>>>> >> > >
>> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
>> >>>>>> >> > > realtime capability ?
>> >>>>>> >> > >
>> >>>>>> >> > > IMO, the point here is to find that (realtime DB update) and
>> >>>>>> >> integrate it
>> >>>>>> >> > > with (time - lag kylin cube).
>> >>>>>> >> > >
>> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
>> [email protected]>
>> >>>>>> wrote:
>> >>>>>> >> > >
>> >>>>>> >> > > > I researched and tested Druid two years ago(I don't know
>> too
>> >>>>>> much
>> >>>>>> >> about
>> >>>>>> >> > > >  the change of Druid in these two years. New features
>> that I
>> >>>>>> know
>> >>>>>> >> are :
>> >>>>>> >> > > > new UI, fully on K8s etc).
>> >>>>>> >> > > >
>> >>>>>> >> > > > Here are some cases you should consider using Druid other
>> >>>>>> than Kylin
>> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid
>> >>>>>> which I
>> >>>>>> >> used
>> >>>>>> >> > two
>> >>>>>> >> > > > years ago):
>> >>>>>> >> > > >
>> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
>> >>>>>> >> > > > - Most queries are small(Based on my test result, I think
>> >>>>>> Druid had
>> >>>>>> >> > > better
>> >>>>>> >> > > > response time for small queries two years ago.)
>> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
>> >>>>>> >> K8S/public
>> >>>>>> >> > > >   cloud platform as your deployment platform.
>> >>>>>> >> > > >
>> >>>>>> >> > > > But I do think there are many scenarios in which Kylin
>> could
>> >>>>>> be
>> >>>>>> >> better,
>> >>>>>> >> > > > like:
>> >>>>>> >> > > >
>> >>>>>> >> > > > - Better performance for complex/big queries. Kylin can
>> have
>> >>>>>> a more
>> >>>>>> >> > > > exact-match/fine-grained
>> >>>>>> >> > > >   Index for queries containing different `Group By
>> >>>>>> dimensions`.
>> >>>>>> >> > > > - User-friendly UI for modeling.
>> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>> >>>>>> >> > > > - ODBC driver for different BI.(its website did not show
>> it
>> >>>>>> supports
>> >>>>>> >> > ODBC
>> >>>>>> >> > > > well)
>> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>> >>>>>> >> > > >
>> >>>>>> >> > > >
>> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
>> >>>>>> >> > > > Hope to help you, or you are free to share your opinion.
>> >>>>>> >> > > >
>> >>>>>> >> > > > ------------------------
>> >>>>>> >> > > > With warm regard
>> >>>>>> >> > > > Xiaoxiang Yu
>> >>>>>> >> > > >
>> >>>>>> >> > > >
>> >>>>>> >> > > >
>> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>> >>>>>> <[email protected]>
>> >>>>>> >> > > wrote:
>> >>>>>> >> > > >
>> >>>>>> >> > > >> Dear Xiaoxiang,
>> >>>>>> >> > > >> Sirs/Madams,
>> >>>>>> >> > > >>
>> >>>>>> >> > > >> May I post my boss's question:
>> >>>>>> >> > > >>
>> >>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
>> >>>>>> compared to
>> >>>>>> >> > Pinot
>> >>>>>> >> > > >> and
>> >>>>>> >> > > >> Druid?
>> >>>>>> >> > > >>
>> >>>>>> >> > > >> Please kindly let me know
>> >>>>>> >> > > >>
>> >>>>>> >> > > >> Thank you very much and best regards
>> >>>>>> >> > > >>
>> >>>>>> >> > > >
>> >>>>>> >> > >
>> >>>>>> >> >
>> >>>>>> >>
>> >>>>>> >
>> >>>>>>
>> >>>>>
>>
>

Re: Pinot/Kylin/Druid quick comparision

Reply via email to