Re: Re: [Discuss][DSIP] Support Merge-On-Write implementation for UNIQUE KEY data model

zhg yang Thu, 23 Jun 2022 19:13:28 -0700

@ Chen Zhang  For the more important features, it is best to send a DISP
first to let everyone discuss the design
Thanks
Yang Zhengguo



Chen Zhang <[email protected]> 于2022年6月23日周四 22:30写道：

> @Minghong We'll use a multi-version delete bitmap, only save delta for
> each version.
> For example, we have a rowset with version [0-98], transaction 99 updated
> some row in that rowset, and so does transaction 100 and 101, there would
> be 3 delete bitmaps on that rowset, corresponding to rows updated by
> version 99, 100 and 101. A query with version x will only see the bitmap up
> to version x. There's more details about space saving and cache
> acceleration, let's discuss it in DSIP.
>
> @Xiaoli, our team have finished most develop works for the basic function
> in our private repository, but there‘s still lots of works to do, welcome
> to get involve.
>
> @Mingyu, could you help to create a DISP doc? I don't seem to have
> permission.
>
> Best
> Chen Zhang
> On Jun 23, 2022, 21:41 +0800, Zhou Minghong <[email protected]>,
> wrote:
> > Hi Chen Zhang
> > one question about "and a delete bitmap (marks some rowid as deleted)”：
> > how to handle transaction information by a bitmap?
> > for example, transaction_100 delete a row, but this still visible to
> transaction_99, but not visible to trasanction_101. How to handle this case?
> >
> >
> > Br/Minghong
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > At 2022-06-23 19:14:58, "Zhu,Xiaoli" <[email protected]> wrote:
> > > Hi Chen Zhang,
> > >
> > > I am very interested in this topic, and want to participate in the
> development.
> > >
> > > 在 2022/6/23 下午2:44，“Chen Zhang”<[email protected]> 写入:
> > >
> > > Hi devs,
> > >
> > > Unique-Key data model is widely used in scenarios like Flink-CDC, user
> > > profile(用户画像), E-commerce orders, but the query performance for current
> > > Merge-On-Read implementation is not good, due to the following reasons:
> > >
> > > 1. Doris can't determine whether one row in a segment file is latest or
> > > outdated, so it has to do some extra merge sort before getting the
> > > latest data, and key comparison is quite CPU-costive.
> > > 2. Aggregate function predicate push down is not supported by the
> > > Unique-Key data model due to reason(1).
> > >
> > > I'd like to propose to support a Merge-On-Write implementation for the
> > > Unique-Key data model, which leverages a new segment-file-level primary
> > > key index (used for point lookup on write) and a delete bitmap (marks
> some
> > > rowid as deleted), which can optimize read performance significantly.
> > >
> > > At the beginning, we wanted to add another Primary-Key data model with
> > > Merge-On-Write implementation, but after a lot of discussion, we'd
> prefer
> > > to improve the Unique-Key data model rather than adding another one.
> > >
> > > I'll add detailed design and related research in the DSIP doc later.
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
>

Re: Re: [Discuss][DSIP] Support Merge-On-Write implementation for UNIQUE KEY data model

Reply via email to