Renjie, thanks for the proposal.

We talked about this today in the Iceberg community sync and the general
feedback was that we're excited work on this, but the proposal left a few
areas unclear. There are a few decisions about how to manage the delete
vectors that need to be added to the design. For example:
1. Would there be only one delete vector per data file?
2. Would this require merge of existing vectors and new deletes at write
time?
3. How would the data file for a vector be identified?
4. If multiple vectors are allowed, what is the plan for keeping the number
of delete vectors small?
5. Would we allow writing multiple delete vectors into the same file?
6. How would we track which files are affected by a combined file of delete
vectors?
7. What are the details of the proposed file format?

In short, we just want to better understand how all this would work.

Thanks!

Ryan


On Mon, Sep 18, 2023 at 8:22 PM Renjie Liu <liurenjie2...@gmail.com> wrote:

> Hi, all:
>
>
>
> I have a proposal to introduce deletion vector file to reduce write
> amplification of iceberg table:
>
>
> https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit?usp=sharing
>
>
>
> Welcome to comment, and looking forward to hear your advice.
>


-- 
Ryan Blue
Tabular

Reply via email to