Re: Proposal: Introduce deletion vector file to reduce write amplification

Anton Okolnychyi Thu, 12 Oct 2023 21:34:46 -0700

I just realized I did not open comments, fixed now. Any feedback or alternative 
ideas are more than welcome!


I'd say we should equally consider all incoming ideas at the moment and see 
what would work best. We haven't agreed on anything yet, so I'd encourage 
everyone to participate in the discussion. 

I will be off next week, will take another look once back.

On 2023/10/12 01:20:50 Renjie Liu wrote:
> Hi, Anton:
> I've gone through the doc, and we are trying to solve the same problems of
> position deletes, but with different approaches. It's quite interesting.
> 
> On Thu, Oct 12, 2023 at 12:11 AM Anton Okolnychyi <[email protected]>
> wrote:
> 
> > I tried to summarize notes from our previous discussions here:
> >
> > https://docs.google.com/document/d/1M4L6o-qnGRwGhbhkW8BnravoTwvCrJV8VvzVQDRJO5I/
> >
> > I am going to iterate on the doc later today.
> >
> > On 2023/10/11 07:06:07 Renjie Liu wrote:
> > > Hi, Russell:
> > >
> > >
> > > > The main things I’m still interested are alternative approaches. I
> > think
> > > > that some of the work that Anton is working on have shown some
> > different
> > > > bottlenecks in applying delete files that I’m not sure are addressed by
> > > > this proposal.
> > >
> > >
> > > I'm also interested. Could you share some resources on the work that
> > Anton
> > > is working? I didn't notice that.
> > >
> > > For example, this proposal suggests doing a 1 to 1 (or 1 rowgroup to 1)
> > > > delete file application in order to speed up planning. But this could
> > as be
> > > > done with a puffin file indexing delete files to data files. This would
> > > > eliminate any planning cost while also allowing us to do more
> > complicated
> > > > things like mapping multiple data files to a single delete file as
> > well as
> > > > operate on a one to many data file to delete file approach. Doing this
> > > > approach would mean we would need to change any existing metadata or
> > > > introduce a new separate file type.
> > >
> > >
> > > Yes, we can improve planning performance by embedding the mapping in a
> > > puffin file. But I guess this may introduce other problems like
> > conflicting
> > > when doing commits? IIUC, puffin file is used as table level index or
> > > statistics.
> > >
> > > I would also expect some POC experiments showing that the Spec is getting
> > > > the benefit’s that are hypothesized.
> > >
> > >
> > >  I will conduct some poc experiments with actual data, but it may take
> > some
> > > time to implement it.
> > >
> > > The proposal I think also needs to address any possible limitations with
> > > > this approach. They don’t all need to be solved but we should at least
> > > > being exploring them. As a quick example, how does using single delete
> > > > files interact with our commit logic? I would guess that a single
> > delete
> > > > file approach would make it more difficult to perform multiple deletes
> > > > concurrently?
> > >
> > >
> > > Good suggestion, I'm working on updating the doc to completement sketches
> > > for dml operations. IIUC, for potential conflicts in performing multiple
> > > deletes concurrently, you mean concurrent writes from different dml jobs?
> > > If so, I think the current solution still has the same problem since this
> > > is in fact conflicts from concurrent updates. But I do admit that the
> > > deletion vector approach makes confliction easier since it's file level.
> > >
> > >
> > >
> > > On Tue, Oct 10, 2023 at 8:54 AM Russell Spitzer <
> > [email protected]>
> > > wrote:
> > >
> > > > The main things I’m still interested are alternative approaches. I
> > think
> > > > that some of the work that Anton is working on have shown some
> > different
> > > > bottlenecks in applying delete files that I’m not sure are addressed by
> > > > this proposal.
> > > >
> > > > For example, this proposal suggests doing a 1 to 1 (or 1 rowgroup to 1)
> > > > delete file application in order to speed up planning. But this could
> > as be
> > > > done with a puffin file indexing delete files to data files. This would
> > > > eliminate any planning cost while also allowing us to do more
> > complicated
> > > > things like mapping multiple data files to a single delete file as
> > well as
> > > > operate on a one to many data file to delete file approach. Doing this
> > > > approach would mean we would need to change any existing metadata or
> > > > introduce a new separate file type.
> > > >
> > > > I think basically for every “benefit” outlined we should think about if
> > > > there is an alternative approach that would achieve the same benefit.
> > Then
> > > > we should analyze or whether or not the proposal is the best solution
> > for
> > > > that particular benefit and do some work to calculate what that benefit
> > > > would be and what drawbacks there might be.
> > > >
> > > > I would also expect some POC experiments showing that the Spec is
> > getting
> > > > the benefit’s that are hypothesized.
> > > >
> > > > The proposal I think also needs to address any possible limitations
> > with
> > > > this approach. They don’t all need to be solved but we should at least
> > > > being exploring them. As a quick example, how does using single delete
> > > > files interact with our commit logic? I would guess that a single
> > delete
> > > > file approach would make it more difficult to perform multiple deletes
> > > > concurrently?
> > > >
> > > > Sent from my iPad
> > > >
> > > > On Oct 8, 2023, at 9:22 PM, Renjie Liu <[email protected]>
> > wrote:
> > > >
> > > > 
> > > > Hi, Ryan:
> > > > Thanks for your reply.
> > > >
> > > > 1. What is the exact file format for these on disk that you're
> > proposing?
> > > >> Even if you're saying that it is what is produced by roaring bitmap,
> > we
> > > >> need more information. Is that a portable format? Do you wrap it at
> > all in
> > > >> the file to carry extra metadata? For example, the proposal says that
> > a
> > > >> starting position for a bitmap would be used. Where is that stored?
> > > >
> > > >
> > > > Sorry for the confusion, by file format I mean roaring bitmap's file
> > > > format <
> > https://github.com/RoaringBitmap/RoaringFormatSpec#general-layout>.
> > > > I checked that it has been implemented in several languages, such as
> > java,
> > > > go, rust, c. Metadata will be stored in manifest file as other entries
> > such
> > > > as datafile, deletion file. The starting position doesn't need to be
> > stored
> > > > since it's used by the file reader. I think your suggestion to provide
> > an
> > > > interface in design will make things clearer, and I will add it to the
> > > > design doc.
> > > >
> > > > 2. How would DML operations work? Just a sketch would be great. I don't
> > > >> think it is a good idea to leave the implications for DML fuzzy.
> > > >
> > > >
> > > > I'll add sketches for other DML operations.
> > > >
> > > > 3. The comparison appears to be between rewriting data files and using
> > > >> delete vectors. I think it needs to compare the existing delete file
> > > >> formats to delete vectors so that we know why there is a benefit to
> > doing
> > > >> this beyond using the current positional delete files. The doc states
> > that
> > > >> there aren't measurements here, which I think we need. Otherwise,
> > should we
> > > >> just have a version of DML that produces one position delete per data
> > file?
> > > >
> > > >
> > > > I think deletion vector files are quite similar to position delete
> > files,
> > > > e.g. you can think of a deletion vector file as one position delete per
> > > > data file. But this change brings new chances for optimization, and
> > there
> > > > is one section talking about it in the design doc. As with the
> > > > measurements, I'll try to design some experiments for it.
> > > >
> > > > 4. I think this is missing some justification for how you're changing
> > data
> > > >> file metadata.
> > > >
> > > >
> > > > I agree with your comment that if we associate one deletion vector
> > with a
> > > > data file, maybe it's better to extend the DataFile struct rather than
> > > > introducing new entries.
> > > >
> > > > I'll update the doc to address the comments.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Oct 9, 2023 at 1:44 AM Ryan Blue <[email protected]> wrote:
> > > >
> > > >> Thanks, Renjie. I went through and made some comments about what is
> > still
> > > >> not clear. Here's a summary:
> > > >>
> > > >> 1. What is the exact file format for these on disk that you're
> > proposing?
> > > >> Even if you're saying that it is what is produced by roaring bitmap,
> > we
> > > >> need more information. Is that a portable format? Do you wrap it at
> > all in
> > > >> the file to carry extra metadata? For example, the proposal says that
> > a
> > > >> starting position for a bitmap would be used. Where is that stored?
> > > >> 2. How would DML operations work? Just a sketch would be great. I
> > don't
> > > >> think it is a good idea to leave the implications for DML fuzzy.
> > > >> 3. The comparison appears to be between rewriting data files and using
> > > >> delete vectors. I think it needs to compare the existing delete file
> > > >> formats to delete vectors so that we know why there is a benefit to
> > doing
> > > >> this beyond using the current positional delete files. The doc states
> > that
> > > >> there aren't measurements here, which I think we need. Otherwise,
> > should we
> > > >> just have a version of DML that produces one position delete per data
> > file?
> > > >> 4. I think this is missing some justification for how you're changing
> > > >> data file metadata.
> > > >>
> > > >> On Sat, Oct 7, 2023 at 4:49 AM Renjie Liu <[email protected]>
> > > >> wrote:
> > > >>
> > > >>> Hi:
> > > >>> I have addressed most comments in the document. I would like to ask
> > > >>> what's the next step? Should we have a vote on this spec to reject
> > it or we
> > > >>> should go on with it?
> > > >>>
> > > >>> On Sat, Sep 30, 2023 at 11:20 PM Renjie Liu <[email protected]
> > >
> > > >>> wrote:
> > > >>>
> > > >>>> Hi:
> > > >>>> Sorry for the late reply, I have been busy recently. I've updated
> > the
> > > >>>> design with more details about your questions, and here is a
> > summary:
> > > >>>>
> > > >>>> > 1. Would there be only one delete vector per data file?
> > > >>>> Yes. It's possible that we have multiple deletion vectors per very
> > > >>>> large data file to further reduce write amplification, but I'm not
> > sure if
> > > >>>> it's over design.
> > > >>>>
> > > >>>> > 2. Would this require merge of existing vectors and new deletes at
> > > >>>> write time?
> > > >>>> Yes. Merging two bitmaps would be quite efficient.
> > > >>>>
> > > >>>> > 3. How would the data file for a vector be identified?
> > > >>>> It will be stored in the manifest file. We will have one  entry for
> > > >>>> deletion file, and we add an extra field `data_file_path` for the
> > > >>>> associated data file path. See Changes to spec
> > > >>>> <
> > https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit#heading=h.p4vrosjzl14j>
> > for
> > > >>>> details, and Write process
> > > >>>> <
> > https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit#heading=h.tft7a34rd2be>
> > for
> > > >>>> example.
> > > >>>>
> > > >>>> > 4. If multiple vectors are allowed, what is the plan for keeping
> > the
> > > >>>> number of delete vectors small?
> > > >>>> I see multiple vectors per data file as an optimization for very
> > large
> > > >>>> data file, and I'm not sure if it's over design.
> > > >>>>
> > > >>>> > 5. Would we allow writing multiple delete vectors into the same
> > file?
> > > >>>> I don't want to do that. Merging delete vectors into one file have
> > two
> > > >>>> concerns:
> > > >>>>
> > > >>>>    - Write amplification.
> > > >>>>    - It makes concurrent modification of data files difficult.
> > > >>>>
> > > >>>> > 6. How would we track which files are affected by a combined file
> > of
> > > >>>> delete vectors?
> > > >>>> Sorry, I don't quite get your point.
> > > >>>>
> > > >>>> > 7. What are the details of the proposed file format?
> > > >>>> I think roaring bitmap would be a good candidate, but other columnar
> > > >>>> formats such as parquet, orc are also possible since they provided
> > great
> > > >>>> compression for boolean columns. I've mentioned it here
> > > >>>> <
> > https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit#heading=h.nrhcjanzai0v
> > >
> > > >>>>
> > > >>>> On Thu, Sep 21, 2023 at 4:53 PM Jean-Baptiste Onofré <
> > [email protected]>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Hi Ryan,
> > > >>>>>
> > > >>>>> Thanks for the feedback. Unfortunately, I was not able to join the
> > > >>>>> Iceberg community sync meeting yesterday, I promise I will join the
> > > >>>>> next ones.
> > > >>>>>
> > > >>>>> I think the proposal is very interesting and also the
> > > >>>>> discussion/comments in the document. I agree that some points
> > should
> > > >>>>> be discussed further. I propose to update the document with your
> > > >>>>> points/questions.
> > > >>>>>
> > > >>>>> Thanks !
> > > >>>>>
> > > >>>>> Regards
> > > >>>>> JB
> > > >>>>>
> > > >>>>> On Thu, Sep 21, 2023 at 2:02 AM Ryan Blue <[email protected]> wrote:
> > > >>>>> >
> > > >>>>> > Renjie, thanks for the proposal.
> > > >>>>> >
> > > >>>>> > We talked about this today in the Iceberg community sync and the
> > > >>>>> general feedback was that we're excited work on this, but the
> > proposal left
> > > >>>>> a few areas unclear. There are a few decisions about how to manage
> > the
> > > >>>>> delete vectors that need to be added to the design. For example:
> > > >>>>> > 1. Would there be only one delete vector per data file?
> > > >>>>> > 2. Would this require merge of existing vectors and new deletes
> > at
> > > >>>>> write time?
> > > >>>>> > 3. How would the data file for a vector be identified?
> > > >>>>> > 4. If multiple vectors are allowed, what is the plan for keeping
> > the
> > > >>>>> number of delete vectors small?
> > > >>>>> > 5. Would we allow writing multiple delete vectors into the same
> > file?
> > > >>>>> > 6. How would we track which files are affected by a combined
> > file of
> > > >>>>> delete vectors?
> > > >>>>> > 7. What are the details of the proposed file format?
> > > >>>>> >
> > > >>>>> > In short, we just want to better understand how all this would
> > work.
> > > >>>>> >
> > > >>>>> > Thanks!
> > > >>>>> >
> > > >>>>> > Ryan
> > > >>>>> >
> > > >>>>> >
> > > >>>>> > On Mon, Sep 18, 2023 at 8:22 PM Renjie Liu <
> > [email protected]>
> > > >>>>> wrote:
> > > >>>>> >>
> > > >>>>> >> Hi, all:
> > > >>>>> >>
> > > >>>>> >>
> > > >>>>> >>
> > > >>>>> >> I have a proposal to introduce deletion vector file to reduce
> > write
> > > >>>>> amplification of iceberg table:
> > > >>>>> >>
> > > >>>>> >>
> > > >>>>>
> > https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit?usp=sharing
> > > >>>>> >>
> > > >>>>> >>
> > > >>>>> >>
> > > >>>>> >> Welcome to comment, and looking forward to hear your advice.
> > > >>>>> >
> > > >>>>> >
> > > >>>>> >
> > > >>>>> > --
> > > >>>>> > Ryan Blue
> > > >>>>> > Tabular
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Renjie Liu
> > > >>>> Software Engineer, MVAD
> > > >>>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Renjie Liu
> > > >>> Software Engineer, MVAD
> > > >>>
> > > >>
> > > >>
> > > >> --
> > > >> Ryan Blue
> > > >> Tabular
> > > >>
> > > >
> > > >
> > > > --
> > > > Renjie Liu
> > > > Software Engineer, MVAD
> > > >
> > > >
> > >
> > > --
> > > Renjie Liu
> > > Software Engineer, MVAD
> > >
> >
> 
> 
> -- 
> Renjie Liu
> Software Engineer, MVAD
>

Re: Proposal: Introduce deletion vector file to reduce write amplification

Reply via email to