Hi Zhang Chen: I have created a DSIP-018 for this[1]. But you need to create an account and tell me your username.
[1] https://cwiki.apache.org/confluence/display/DORIS/DSIP-018%3A+Support+Merge-On-Write+implementation+for+UNIQUE+KEY+data+model -- 此致!Best Regards 陈明雨 Mingyu Chen Email: morning...@apache.org At 2022-06-23 22:29:49, "Chen Zhang" <chzhang1...@gmail.com> wrote: >@Minghong We'll use a multi-version delete bitmap, only save delta for each >version. >For example, we have a rowset with version [0-98], transaction 99 updated some >row in that rowset, and so does transaction 100 and 101, there would be 3 >delete bitmaps on that rowset, corresponding to rows updated by version 99, >100 and 101. A query with version x will only see the bitmap up to version x. >There's more details about space saving and cache acceleration, let's discuss >it in DSIP. > >@Xiaoli, our team have finished most develop works for the basic function in >our private repository, but there‘s still lots of works to do, welcome to get >involve. > >@Mingyu, could you help to create a DISP doc? I don't seem to have permission. > >Best >Chen Zhang >On Jun 23, 2022, 21:41 +0800, Zhou Minghong <minghong.z...@163.com>, wrote: >> Hi Chen Zhang >> one question about "and a delete bitmap (marks some rowid as deleted)”: >> how to handle transaction information by a bitmap? >> for example, transaction_100 delete a row, but this still visible to >> transaction_99, but not visible to trasanction_101. How to handle this case? >> >> >> Br/Minghong >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> At 2022-06-23 19:14:58, "Zhu,Xiaoli" <zhuxiaol...@baidu.com> wrote: >> > Hi Chen Zhang, >> > >> > I am very interested in this topic, and want to participate in the >> > development. >> > >> > 在 2022/6/23 下午2:44,“Chen Zhang”<chzhang1...@gmail.com> 写入: >> > >> > Hi devs, >> > >> > Unique-Key data model is widely used in scenarios like Flink-CDC, user >> > profile(用户画像), E-commerce orders, but the query performance for current >> > Merge-On-Read implementation is not good, due to the following reasons: >> > >> > 1. Doris can't determine whether one row in a segment file is latest or >> > outdated, so it has to do some extra merge sort before getting the >> > latest data, and key comparison is quite CPU-costive. >> > 2. Aggregate function predicate push down is not supported by the >> > Unique-Key data model due to reason(1). >> > >> > I'd like to propose to support a Merge-On-Write implementation for the >> > Unique-Key data model, which leverages a new segment-file-level primary >> > key index (used for point lookup on write) and a delete bitmap (marks some >> > rowid as deleted), which can optimize read performance significantly. >> > >> > At the beginning, we wanted to add another Primary-Key data model with >> > Merge-On-Write implementation, but after a lot of discussion, we'd prefer >> > to improve the Unique-Key data model rather than adding another one. >> > >> > I'll add detailed design and related research in the DSIP doc later. >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org >> > For additional commands, e-mail: dev-h...@doris.apache.org >> >