Hi Chen Zhang,

I am very interested in this topic, and want to participate in the development.

在 2022/6/23 下午2:44,“Chen Zhang”<chzhang1...@gmail.com> 写入:

    Hi devs,

    Unique-Key data model is widely used in scenarios like Flink-CDC, user
    profile(用户画像), E-commerce orders, but the query performance for current
    Merge-On-Read implementation is not good, due to the following reasons:

       1. Doris can't determine whether one row in a segment file is latest or
       outdated, so it has to do some extra merge sort before getting the
       latest data, and key comparison is quite CPU-costive.
       2. Aggregate function predicate push down is not supported by the
       Unique-Key data model due to reason(1).

    I'd like to propose to support a Merge-On-Write implementation for the
    Unique-Key data model,  which leverages a new segment-file-level primary
    key index (used for point lookup on write) and a delete bitmap (marks some
    rowid as deleted), which can optimize read performance significantly.

    At the beginning, we wanted to add another Primary-Key data model with
    Merge-On-Write implementation, but after a lot of discussion, we'd prefer
    to improve the Unique-Key data model rather than adding another one.

    I'll add detailed design and related research in the DSIP doc later.

Reply via email to