Hi Ryan, Thanks for the feedback. Unfortunately, I was not able to join the Iceberg community sync meeting yesterday, I promise I will join the next ones.
I think the proposal is very interesting and also the discussion/comments in the document. I agree that some points should be discussed further. I propose to update the document with your points/questions. Thanks ! Regards JB On Thu, Sep 21, 2023 at 2:02 AM Ryan Blue <b...@tabular.io> wrote: > > Renjie, thanks for the proposal. > > We talked about this today in the Iceberg community sync and the general > feedback was that we're excited work on this, but the proposal left a few > areas unclear. There are a few decisions about how to manage the delete > vectors that need to be added to the design. For example: > 1. Would there be only one delete vector per data file? > 2. Would this require merge of existing vectors and new deletes at write time? > 3. How would the data file for a vector be identified? > 4. If multiple vectors are allowed, what is the plan for keeping the number > of delete vectors small? > 5. Would we allow writing multiple delete vectors into the same file? > 6. How would we track which files are affected by a combined file of delete > vectors? > 7. What are the details of the proposed file format? > > In short, we just want to better understand how all this would work. > > Thanks! > > Ryan > > > On Mon, Sep 18, 2023 at 8:22 PM Renjie Liu <liurenjie2...@gmail.com> wrote: >> >> Hi, all: >> >> >> >> I have a proposal to introduce deletion vector file to reduce write >> amplification of iceberg table: >> >> https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit?usp=sharing >> >> >> >> Welcome to comment, and looking forward to hear your advice. > > > > -- > Ryan Blue > Tabular