Hi Devs, I've update the DISP last weekend, if you are interest on this feature, welcome to review and comment, thanks https://cwiki.apache.org/confluence/display/DORIS/DSIP-018%3A+Support+Merge-On-Write+implementation+for+UNIQUE+KEY+data+model
Best Chen Zhang 在 2022年6月24日 +0800 10:13,zhg yang <yangz...@gmail.com>,写道: > @ Chen Zhang For the more important features, it is best to send a DISP > first to let everyone discuss the design > Thanks > Yang Zhengguo > > > Chen Zhang <chzhang1...@gmail.com> 于2022年6月23日周四 22:30写道: > > > @Minghong We'll use a multi-version delete bitmap, only save delta for > > each version. > > For example, we have a rowset with version [0-98], transaction 99 updated > > some row in that rowset, and so does transaction 100 and 101, there would > > be 3 delete bitmaps on that rowset, corresponding to rows updated by > > version 99, 100 and 101. A query with version x will only see the bitmap up > > to version x. There's more details about space saving and cache > > acceleration, let's discuss it in DSIP. > > > > @Xiaoli, our team have finished most develop works for the basic function > > in our private repository, but there‘s still lots of works to do, welcome > > to get involve. > > > > @Mingyu, could you help to create a DISP doc? I don't seem to have > > permission. > > > > Best > > Chen Zhang > > On Jun 23, 2022, 21:41 +0800, Zhou Minghong <minghong.z...@163.com>, > > wrote: > > > Hi Chen Zhang > > > one question about "and a delete bitmap (marks some rowid as deleted)”: > > > how to handle transaction information by a bitmap? > > > for example, transaction_100 delete a row, but this still visible to > > transaction_99, but not visible to trasanction_101. How to handle this case? > > > > > > > > > Br/Minghong > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At 2022-06-23 19:14:58, "Zhu,Xiaoli" <zhuxiaol...@baidu.com> wrote: > > > > Hi Chen Zhang, > > > > > > > > I am very interested in this topic, and want to participate in the > > development. > > > > > > > > 在 2022/6/23 下午2:44,“Chen Zhang”<chzhang1...@gmail.com> 写入: > > > > > > > > Hi devs, > > > > > > > > Unique-Key data model is widely used in scenarios like Flink-CDC, user > > > > profile(用户画像), E-commerce orders, but the query performance for current > > > > Merge-On-Read implementation is not good, due to the following reasons: > > > > > > > > 1. Doris can't determine whether one row in a segment file is latest or > > > > outdated, so it has to do some extra merge sort before getting the > > > > latest data, and key comparison is quite CPU-costive. > > > > 2. Aggregate function predicate push down is not supported by the > > > > Unique-Key data model due to reason(1). > > > > > > > > I'd like to propose to support a Merge-On-Write implementation for the > > > > Unique-Key data model, which leverages a new segment-file-level primary > > > > key index (used for point lookup on write) and a delete bitmap (marks > > some > > > > rowid as deleted), which can optimize read performance significantly. > > > > > > > > At the beginning, we wanted to add another Primary-Key data model with > > > > Merge-On-Write implementation, but after a lot of discussion, we'd > > prefer > > > > to improve the Unique-Key data model rather than adding another one. > > > > > > > > I'll add detailed design and related research in the DSIP doc later. > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org > > > > For additional commands, e-mail: dev-h...@doris.apache.org > > > > > >