Druid quick comparision

Nam Đỗ Duy via user Tue, 05 Dec 2023 18:35:33 -0800

Thank you very much for your prompt response, I still have several
questions to seek for your help later.


Best regards and have a good day



On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <x...@apache.org> wrote:

> Done. Github branch changed to kylin5.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <x...@apache.org> wrote:
>
> > A JIRA ticket has been opened, waiting for INFRA :
> > https://issues.apache.org/jira/browse/INFRA-25238 .
> > ------------------------
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >
> >> Thank you Xiaoxiang, please update me when you have changed your default
> >> branch. In case people are impressed by the numbers then I hope to turn
> >> this situation to reverse direction.
> >>
> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <x...@apache.org> wrote:
> >>
> >>> The default branch is for 4.X which is a maintained branch, the active
> >>> branch is kylin5.
> >>> I will change the default branch to kylin5 later.
> >>>
> >>> ------------------------
> >>> With warm regard
> >>> Xiaoxiang Yu
> >>>
> >>>
> >>>
> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> >>> wrote:
> >>>
> >>>> Hi Xiaoxiang, Sirs / Madams
> >>>>
> >>>> Can you see the atttached photo
> >>>>
> >>>> My boss asked that why druid commit code regularly but kylin had not
> >>>> been committed since July
> >>>>
> >>>>
> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <x...@apache.org> wrote:
> >>>>
> >>>>> I think so.
> >>>>>
> >>>>> Response time is not the only factor to make a decision. Kylin could
> >>>>> be cheaper
> >>>>> when the query pattern is suitable for the Kylin model, and Kylin can
> >>>>> guarantee
> >>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc
> >>>>> query scenario.
> >>>>>
> >>>>> By the way, Youzan and Kyligence combine them together to provide
> >>>>> unified data analytics services for their customers.
> >>>>>
> >>>>> ------------------------
> >>>>> With warm regard
> >>>>> Xiaoxiang Yu
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Xiaoxiang, thank you
> >>>>>>
> >>>>>> In case my client uses cloud computing service like gcp or aws,
> which
> >>>>>> will cost more: precalculation feature of kylin or clickhouse
> (incase
> >>>>>> of
> >>>>>> kylin, I have a thought that the query execution has been done once
> >>>>>> and
> >>>>>> stored in cube to be used many times so kylin uses less cloud
> >>>>>> computation,
> >>>>>> is that true)?
> >>>>>>
> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <x...@apache.org>
> wrote:
> >>>>>>
> >>>>>> > Following text is part of an article(
> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
> >>>>>> >
> >>>>>> >
> >>>>>> >
> >>>>>>
> ===============================================================================
> >>>>>> >
> >>>>>> > Kylin is suitable for aggregation queries with fixed modes because
> >>>>>> of its
> >>>>>> > pre-calculated technology, for example, join, group by, and where
> >>>>>> condition
> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data volume
> >>>>>> is, the
> >>>>>> > more obvious the advantages of using Kylin are; in particular,
> >>>>>> Kylin is
> >>>>>> > particularly advantageous in the scenarios of de-emphasis (count
> >>>>>> distinct),
> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
> >>>>>> de-weighting
> >>>>>> > (count distinct), Top N, Percentile and other scenarios are
> >>>>>> especially
> >>>>>> > huge, and it is used in a large number of scenarios, such as
> >>>>>> Dashboard, all
> >>>>>> > kinds of reports, large-screen display, traffic statistics, and
> user
> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin
> >>>>>> to build
> >>>>>> > their data service platforms, providing millions to tens of
> >>>>>> millions of
> >>>>>> > queries per day, and most of the queries can be completed within 2
> >>>>>> - 3
> >>>>>> > seconds. There is no better alternative for such a high
> concurrency
> >>>>>> > scenario.
> >>>>>> >
> >>>>>> > ClickHouse, because of its MPP architecture, has high computing
> >>>>>> power and
> >>>>>> > is more suitable when the query request is more flexible, or when
> >>>>>> there is
> >>>>>> > a need for detailed queries with low concurrency. Scenarios
> >>>>>> include: very
> >>>>>> > many columns and where conditions are arbitrarily combined with
> the
> >>>>>> user
> >>>>>> > label filtering, not a large amount of concurrency of complex
> >>>>>> on-the-spot
> >>>>>> > query and so on. If the amount of data and access is large, you
> >>>>>> need to
> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher
> >>>>>> challenge for
> >>>>>> > operation and maintenance.
> >>>>>> >
> >>>>>> > If some queries are very flexible but infrequent, it is more
> >>>>>> > resource-efficient to use now-computing. Since the number of
> >>>>>> queries is
> >>>>>> > small, even if each query consumes a lot of computational
> >>>>>> resources, it is
> >>>>>> > still cost-effective overall. If some queries have a fixed pattern
> >>>>>> and the
> >>>>>> > query volume is large, it is more suitable for Kylin, because the
> >>>>>> query
> >>>>>> > volume is large, and by using large computational resources to
> save
> >>>>>> the
> >>>>>> > results, the upfront computational cost can be amortized over each
> >>>>>> query,
> >>>>>> > so it is the most economical.
> >>>>>> >
> >>>>>> > --- Translated with DeepL.com (free version)
> >>>>>> >
> >>>>>> >
> >>>>>> > ------------------------
> >>>>>> > With warm regard
> >>>>>> > Xiaoxiang Yu
> >>>>>> >
> >>>>>> >
> >>>>>> >
> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid
> >
> >>>>>> wrote:
> >>>>>> >
> >>>>>> >> Thank you Xiaoxiang for the near real time streaming feature.
> >>>>>> That's
> >>>>>> >> great.
> >>>>>> >>
> >>>>>> >> This morning there has been a new challenge to my team:
> clickhouse
> >>>>>> offered
> >>>>>> >> us the speed of calculating 8 billion rows in millisecond which
> is
> >>>>>> faster
> >>>>>> >> than my demonstration (I used Kylin to do calculating 1 billion
> >>>>>> rows in
> >>>>>> >> 2.9
> >>>>>> >> seconds)
> >>>>>> >>
> >>>>>> >> Can you briefly suggest the advantages of kylin over clickhouse
> so
> >>>>>> that I
> >>>>>> >> can defend my demonstration.
> >>>>>> >>
> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <x...@apache.org>
> >>>>>> wrote:
> >>>>>> >>
> >>>>>> >> > 1. "In this important scenario of realtime analytics, the
> reason
> >>>>>> here is
> >>>>>> >> > that
> >>>>>> >> > kylin has lag time due to model update of new segment build, is
> >>>>>> that
> >>>>>> >> > correct?"
> >>>>>> >> >
> >>>>>> >> > You are correct.
> >>>>>> >> >
> >>>>>> >> > 2. "If that is true, then can you suggest a work-around of
> >>>>>> combination
> >>>>>> >> of
> >>>>>> >> > ... "
> >>>>>> >> >
> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
> completed
> >>>>>> but not
> >>>>>> >> > released),
> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my
> >>>>>> estimation
> >>>>>> >> but I
> >>>>>> >> > am
> >>>>>> >> > quite certain about it).
> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do
> >>>>>> micro-batch
> >>>>>> >> > aggregation and persistence periodically. The price is that you
> >>>>>> need to
> >>>>>> >> run
> >>>>>> >> > and monitor a long-running
> >>>>>> >> >  job. This feature is based on Spark Streaming, so you need
> >>>>>> knowledge of
> >>>>>> >> > it.
> >>>>>> >> >
> >>>>>> >> > I am curious about what is the maximum time-lag your customers
> >>>>>> >> > can tolerate?
> >>>>>> >> > Personally, I guess minute level time-lag is ok for most cases.
> >>>>>> >> >
> >>>>>> >> > ------------------------
> >>>>>> >> > With warm regard
> >>>>>> >> > Xiaoxiang Yu
> >>>>>> >> >
> >>>>>> >> >
> >>>>>> >> >
> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
> >>>>>> <na...@vnpay.vn.invalid>
> >>>>>> >> wrote:
> >>>>>> >> >
> >>>>>> >> > > Druid is better in
> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
> >>>>>> >> > >
> >>>>>> >> > > ==========================
> >>>>>> >> > >
> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
> >>>>>> >> > >
> >>>>>> >> > > In this important scenario of realtime alalytics, the reason
> >>>>>> here is
> >>>>>> >> that
> >>>>>> >> > > kylin has lag time due to model update of new segment build,
> >>>>>> is that
> >>>>>> >> > > correct?
> >>>>>> >> > >
> >>>>>> >> > > If that is true, then can you suggest a work-around of
> >>>>>> combination of
> >>>>>> >> :
> >>>>>> >> > >
> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
> >>>>>> >> > > realtime capability ?
> >>>>>> >> > >
> >>>>>> >> > > IMO, the point here is to find that (realtime DB update) and
> >>>>>> >> integrate it
> >>>>>> >> > > with (time - lag kylin cube).
> >>>>>> >> > >
> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <x...@apache.org
> >
> >>>>>> wrote:
> >>>>>> >> > >
> >>>>>> >> > > > I researched and tested Druid two years ago(I don't know
> too
> >>>>>> much
> >>>>>> >> about
> >>>>>> >> > > >  the change of Druid in these two years. New features that
> I
> >>>>>> know
> >>>>>> >> are :
> >>>>>> >> > > > new UI, fully on K8s etc).
> >>>>>> >> > > >
> >>>>>> >> > > > Here are some cases you should consider using Druid other
> >>>>>> than Kylin
> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid
> >>>>>> which I
> >>>>>> >> used
> >>>>>> >> > two
> >>>>>> >> > > > years ago):
> >>>>>> >> > > >
> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
> >>>>>> >> > > > - Most queries are small(Based on my test result, I think
> >>>>>> Druid had
> >>>>>> >> > > better
> >>>>>> >> > > > response time for small queries two years ago.)
> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
> >>>>>> >> K8S/public
> >>>>>> >> > > >   cloud platform as your deployment platform.
> >>>>>> >> > > >
> >>>>>> >> > > > But I do think there are many scenarios in which Kylin
> could
> >>>>>> be
> >>>>>> >> better,
> >>>>>> >> > > > like:
> >>>>>> >> > > >
> >>>>>> >> > > > - Better performance for complex/big queries. Kylin can
> have
> >>>>>> a more
> >>>>>> >> > > > exact-match/fine-grained
> >>>>>> >> > > >   Index for queries containing different `Group By
> >>>>>> dimensions`.
> >>>>>> >> > > > - User-friendly UI for modeling.
> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
> >>>>>> >> > > > - ODBC driver for different BI.(its website did not show it
> >>>>>> supports
> >>>>>> >> > ODBC
> >>>>>> >> > > > well)
> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
> >>>>>> >> > > >
> >>>>>> >> > > >
> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
> >>>>>> >> > > > Hope to help you, or you are free to share your opinion.
> >>>>>> >> > > >
> >>>>>> >> > > > ------------------------
> >>>>>> >> > > > With warm regard
> >>>>>> >> > > > Xiaoxiang Yu
> >>>>>> >> > > >
> >>>>>> >> > > >
> >>>>>> >> > > >
> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
> >>>>>> <na...@vnpay.vn.invalid>
> >>>>>> >> > > wrote:
> >>>>>> >> > > >
> >>>>>> >> > > >> Dear Xiaoxiang,
> >>>>>> >> > > >> Sirs/Madams,
> >>>>>> >> > > >>
> >>>>>> >> > > >> May I post my boss's question:
> >>>>>> >> > > >>
> >>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
> >>>>>> compared to
> >>>>>> >> > Pinot
> >>>>>> >> > > >> and
> >>>>>> >> > > >> Druid?
> >>>>>> >> > > >>
> >>>>>> >> > > >> Please kindly let me know
> >>>>>> >> > > >>
> >>>>>> >> > > >> Thank you very much and best regards
> >>>>>> >> > > >>
> >>>>>> >> > > >
> >>>>>> >> > >
> >>>>>> >> >
> >>>>>> >>
> >>>>>> >
> >>>>>>
> >>>>>
>

Re: Pinot/Kylin/Druid quick comparision

Reply via email to