Re:Re:Re: [Discuss][DSIP] Support Merge-On-Write implementation for UNIQUE KEY data model

陈明雨 Thu, 23 Jun 2022 09:56:48 -0700

Hi Zhang Chen:
I have created a DSIP-018 for this[1]. But you need to create an account and 
tell me your username.



[1] 
https://cwiki.apache.org/confluence/display/DORIS/DSIP-018%3A+Support+Merge-On-Write+implementation+for+UNIQUE+KEY+data+model




--

此致！Best Regards
陈明雨 Mingyu Chen

Email:
[email protected]





At 2022-06-23 22:29:49, "Chen Zhang" <[email protected]> wrote:
>@Minghong We'll use a multi-version delete bitmap, only save delta for each 
>version.
>For example, we have a rowset with version [0-98], transaction 99 updated some 
>row in that rowset, and so does transaction 100 and 101, there would be 3 
>delete bitmaps on that rowset, corresponding to rows updated by version 99, 
>100 and 101. A query with version x will only see the bitmap up to version x. 
>There's more details about space saving and cache acceleration, let's discuss 
>it in DSIP.
>
>@Xiaoli, our team have finished most develop works for the basic function in 
>our private repository, but there‘s still lots of works to do, welcome to get 
>involve.
>
>@Mingyu, could you help to create a DISP doc? I don't seem to have permission.
>
>Best
>Chen Zhang
>On Jun 23, 2022, 21:41 +0800, Zhou Minghong <[email protected]>, wrote:
>> Hi Chen Zhang
>> one question about "and a delete bitmap (marks some rowid as deleted)”：
>> how to handle transaction information by a bitmap?
>> for example, transaction_100 delete a row, but this still visible to 
>> transaction_99, but not visible to trasanction_101. How to handle this case?
>>
>>
>> Br/Minghong
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> At 2022-06-23 19:14:58, "Zhu,Xiaoli" <[email protected]> wrote:
>> > Hi Chen Zhang,
>> >
>> > I am very interested in this topic, and want to participate in the 
>> > development.
>> >
>> > 在 2022/6/23 下午2:44，“Chen Zhang”<[email protected]> 写入:
>> >
>> > Hi devs,
>> >
>> > Unique-Key data model is widely used in scenarios like Flink-CDC, user
>> > profile(用户画像), E-commerce orders, but the query performance for current
>> > Merge-On-Read implementation is not good, due to the following reasons:
>> >
>> > 1. Doris can't determine whether one row in a segment file is latest or
>> > outdated, so it has to do some extra merge sort before getting the
>> > latest data, and key comparison is quite CPU-costive.
>> > 2. Aggregate function predicate push down is not supported by the
>> > Unique-Key data model due to reason(1).
>> >
>> > I'd like to propose to support a Merge-On-Write implementation for the
>> > Unique-Key data model, which leverages a new segment-file-level primary
>> > key index (used for point lookup on write) and a delete bitmap (marks some
>> > rowid as deleted), which can optimize read performance significantly.
>> >
>> > At the beginning, we wanted to add another Primary-Key data model with
>> > Merge-On-Write implementation, but after a lot of discussion, we'd prefer
>> > to improve the Unique-Key data model rather than adding another one.
>> >
>> > I'll add detailed design and related research in the DSIP doc later.
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected]
>> > For additional commands, e-mail: [email protected]
>> >

Re:Re:Re: [Discuss][DSIP] Support Merge-On-Write implementation for UNIQUE KEY data model

Reply via email to