Re: Proposal: Introduce deletion vector file to reduce write amplification

Jean-Baptiste Onofré Fri, 13 Oct 2023 01:55:19 -0700

That's a fair point and agree on that.

I think having some kind of performance numbers and comparison would
be helpful (depending of the use cases).


Regards
JB

On Fri, Oct 13, 2023 at 7:27 AM Renjie Liu <liurenjie2...@gmail.com> wrote:
>>
>> I'd say we should equally consider all incoming ideas at the moment and see 
>> what would work best. We haven't agreed on anything yet, so I'd encourage 
>> everyone to participate in the discussion.
>
>
> Can't agree more. I think we share the same goal to improve performance of 
> iceberg, and welcome everyone to join the discussion.
>
> On Fri, Oct 13, 2023 at 12:35 PM Anton Okolnychyi <aokolnyc...@apache.org> 
> wrote:
>>
>> I just realized I did not open comments, fixed now. Any feedback or 
>> alternative ideas are more than welcome!
>>
>> I'd say we should equally consider all incoming ideas at the moment and see 
>> what would work best. We haven't agreed on anything yet, so I'd encourage 
>> everyone to participate in the discussion.
>>
>> I will be off next week, will take another look once back.
>>
>> On 2023/10/12 01:20:50 Renjie Liu wrote:
>> > Hi, Anton:
>> > I've gone through the doc, and we are trying to solve the same problems of
>> > position deletes, but with different approaches. It's quite interesting.
>> >
>> > On Thu, Oct 12, 2023 at 12:11 AM Anton Okolnychyi <aokolnyc...@apache.org>
>> > wrote:
>> >
>> > > I tried to summarize notes from our previous discussions here:
>> > >
>> > > https://docs.google.com/document/d/1M4L6o-qnGRwGhbhkW8BnravoTwvCrJV8VvzVQDRJO5I/
>> > >
>> > > I am going to iterate on the doc later today.
>> > >
>> > > On 2023/10/11 07:06:07 Renjie Liu wrote:
>> > > > Hi, Russell:
>> > > >
>> > > >
>> > > > > The main things I’m still interested are alternative approaches. I
>> > > think
>> > > > > that some of the work that Anton is working on have shown some
>> > > different
>> > > > > bottlenecks in applying delete files that I’m not sure are addressed 
>> > > > > by
>> > > > > this proposal.
>> > > >
>> > > >
>> > > > I'm also interested. Could you share some resources on the work that
>> > > Anton
>> > > > is working? I didn't notice that.
>> > > >
>> > > > For example, this proposal suggests doing a 1 to 1 (or 1 rowgroup to 1)
>> > > > > delete file application in order to speed up planning. But this could
>> > > as be
>> > > > > done with a puffin file indexing delete files to data files. This 
>> > > > > would
>> > > > > eliminate any planning cost while also allowing us to do more
>> > > complicated
>> > > > > things like mapping multiple data files to a single delete file as
>> > > well as
>> > > > > operate on a one to many data file to delete file approach. Doing 
>> > > > > this
>> > > > > approach would mean we would need to change any existing metadata or
>> > > > > introduce a new separate file type.
>> > > >
>> > > >
>> > > > Yes, we can improve planning performance by embedding the mapping in a
>> > > > puffin file. But I guess this may introduce other problems like
>> > > conflicting
>> > > > when doing commits? IIUC, puffin file is used as table level index or
>> > > > statistics.
>> > > >
>> > > > I would also expect some POC experiments showing that the Spec is 
>> > > > getting
>> > > > > the benefit’s that are hypothesized.
>> > > >
>> > > >
>> > > >  I will conduct some poc experiments with actual data, but it may take
>> > > some
>> > > > time to implement it.
>> > > >
>> > > > The proposal I think also needs to address any possible limitations 
>> > > > with
>> > > > > this approach. They don’t all need to be solved but we should at 
>> > > > > least
>> > > > > being exploring them. As a quick example, how does using single 
>> > > > > delete
>> > > > > files interact with our commit logic? I would guess that a single
>> > > delete
>> > > > > file approach would make it more difficult to perform multiple 
>> > > > > deletes
>> > > > > concurrently?
>> > > >
>> > > >
>> > > > Good suggestion, I'm working on updating the doc to completement 
>> > > > sketches
>> > > > for dml operations. IIUC, for potential conflicts in performing 
>> > > > multiple
>> > > > deletes concurrently, you mean concurrent writes from different dml 
>> > > > jobs?
>> > > > If so, I think the current solution still has the same problem since 
>> > > > this
>> > > > is in fact conflicts from concurrent updates. But I do admit that the
>> > > > deletion vector approach makes confliction easier since it's file 
>> > > > level.
>> > > >
>> > > >
>> > > >
>> > > > On Tue, Oct 10, 2023 at 8:54 AM Russell Spitzer <
>> > > russell.spit...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > The main things I’m still interested are alternative approaches. I
>> > > think
>> > > > > that some of the work that Anton is working on have shown some
>> > > different
>> > > > > bottlenecks in applying delete files that I’m not sure are addressed 
>> > > > > by
>> > > > > this proposal.
>> > > > >
>> > > > > For example, this proposal suggests doing a 1 to 1 (or 1 rowgroup to 
>> > > > > 1)
>> > > > > delete file application in order to speed up planning. But this could
>> > > as be
>> > > > > done with a puffin file indexing delete files to data files. This 
>> > > > > would
>> > > > > eliminate any planning cost while also allowing us to do more
>> > > complicated
>> > > > > things like mapping multiple data files to a single delete file as
>> > > well as
>> > > > > operate on a one to many data file to delete file approach. Doing 
>> > > > > this
>> > > > > approach would mean we would need to change any existing metadata or
>> > > > > introduce a new separate file type.
>> > > > >
>> > > > > I think basically for every “benefit” outlined we should think about 
>> > > > > if
>> > > > > there is an alternative approach that would achieve the same benefit.
>> > > Then
>> > > > > we should analyze or whether or not the proposal is the best solution
>> > > for
>> > > > > that particular benefit and do some work to calculate what that 
>> > > > > benefit
>> > > > > would be and what drawbacks there might be.
>> > > > >
>> > > > > I would also expect some POC experiments showing that the Spec is
>> > > getting
>> > > > > the benefit’s that are hypothesized.
>> > > > >
>> > > > > The proposal I think also needs to address any possible limitations
>> > > with
>> > > > > this approach. They don’t all need to be solved but we should at 
>> > > > > least
>> > > > > being exploring them. As a quick example, how does using single 
>> > > > > delete
>> > > > > files interact with our commit logic? I would guess that a single
>> > > delete
>> > > > > file approach would make it more difficult to perform multiple 
>> > > > > deletes
>> > > > > concurrently?
>> > > > >
>> > > > > Sent from my iPad
>> > > > >
>> > > > > On Oct 8, 2023, at 9:22 PM, Renjie Liu <liurenjie2...@gmail.com>
>> > > wrote:
>> > > > >
>> > > > > 
>> > > > > Hi, Ryan:
>> > > > > Thanks for your reply.
>> > > > >
>> > > > > 1. What is the exact file format for these on disk that you're
>> > > proposing?
>> > > > >> Even if you're saying that it is what is produced by roaring bitmap,
>> > > we
>> > > > >> need more information. Is that a portable format? Do you wrap it at
>> > > all in
>> > > > >> the file to carry extra metadata? For example, the proposal says 
>> > > > >> that
>> > > a
>> > > > >> starting position for a bitmap would be used. Where is that stored?
>> > > > >
>> > > > >
>> > > > > Sorry for the confusion, by file format I mean roaring bitmap's file
>> > > > > format <
>> > > https://github.com/RoaringBitmap/RoaringFormatSpec#general-layout>.
>> > > > > I checked that it has been implemented in several languages, such as
>> > > java,
>> > > > > go, rust, c. Metadata will be stored in manifest file as other 
>> > > > > entries
>> > > such
>> > > > > as datafile, deletion file. The starting position doesn't need to be
>> > > stored
>> > > > > since it's used by the file reader. I think your suggestion to 
>> > > > > provide
>> > > an
>> > > > > interface in design will make things clearer, and I will add it to 
>> > > > > the
>> > > > > design doc.
>> > > > >
>> > > > > 2. How would DML operations work? Just a sketch would be great. I 
>> > > > > don't
>> > > > >> think it is a good idea to leave the implications for DML fuzzy.
>> > > > >
>> > > > >
>> > > > > I'll add sketches for other DML operations.
>> > > > >
>> > > > > 3. The comparison appears to be between rewriting data files and 
>> > > > > using
>> > > > >> delete vectors. I think it needs to compare the existing delete file
>> > > > >> formats to delete vectors so that we know why there is a benefit to
>> > > doing
>> > > > >> this beyond using the current positional delete files. The doc 
>> > > > >> states
>> > > that
>> > > > >> there aren't measurements here, which I think we need. Otherwise,
>> > > should we
>> > > > >> just have a version of DML that produces one position delete per 
>> > > > >> data
>> > > file?
>> > > > >
>> > > > >
>> > > > > I think deletion vector files are quite similar to position delete
>> > > files,
>> > > > > e.g. you can think of a deletion vector file as one position delete 
>> > > > > per
>> > > > > data file. But this change brings new chances for optimization, and
>> > > there
>> > > > > is one section talking about it in the design doc. As with the
>> > > > > measurements, I'll try to design some experiments for it.
>> > > > >
>> > > > > 4. I think this is missing some justification for how you're changing
>> > > data
>> > > > >> file metadata.
>> > > > >
>> > > > >
>> > > > > I agree with your comment that if we associate one deletion vector
>> > > with a
>> > > > > data file, maybe it's better to extend the DataFile struct rather 
>> > > > > than
>> > > > > introducing new entries.
>> > > > >
>> > > > > I'll update the doc to address the comments.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Oct 9, 2023 at 1:44 AM Ryan Blue <b...@tabular.io> wrote:
>> > > > >
>> > > > >> Thanks, Renjie. I went through and made some comments about what is
>> > > still
>> > > > >> not clear. Here's a summary:
>> > > > >>
>> > > > >> 1. What is the exact file format for these on disk that you're
>> > > proposing?
>> > > > >> Even if you're saying that it is what is produced by roaring bitmap,
>> > > we
>> > > > >> need more information. Is that a portable format? Do you wrap it at
>> > > all in
>> > > > >> the file to carry extra metadata? For example, the proposal says 
>> > > > >> that
>> > > a
>> > > > >> starting position for a bitmap would be used. Where is that stored?
>> > > > >> 2. How would DML operations work? Just a sketch would be great. I
>> > > don't
>> > > > >> think it is a good idea to leave the implications for DML fuzzy.
>> > > > >> 3. The comparison appears to be between rewriting data files and 
>> > > > >> using
>> > > > >> delete vectors. I think it needs to compare the existing delete file
>> > > > >> formats to delete vectors so that we know why there is a benefit to
>> > > doing
>> > > > >> this beyond using the current positional delete files. The doc 
>> > > > >> states
>> > > that
>> > > > >> there aren't measurements here, which I think we need. Otherwise,
>> > > should we
>> > > > >> just have a version of DML that produces one position delete per 
>> > > > >> data
>> > > file?
>> > > > >> 4. I think this is missing some justification for how you're 
>> > > > >> changing
>> > > > >> data file metadata.
>> > > > >>
>> > > > >> On Sat, Oct 7, 2023 at 4:49 AM Renjie Liu <liurenjie2...@gmail.com>
>> > > > >> wrote:
>> > > > >>
>> > > > >>> Hi:
>> > > > >>> I have addressed most comments in the document. I would like to ask
>> > > > >>> what's the next step? Should we have a vote on this spec to reject
>> > > it or we
>> > > > >>> should go on with it?
>> > > > >>>
>> > > > >>> On Sat, Sep 30, 2023 at 11:20 PM Renjie Liu 
>> > > > >>> <liurenjie2...@gmail.com
>> > > >
>> > > > >>> wrote:
>> > > > >>>
>> > > > >>>> Hi:
>> > > > >>>> Sorry for the late reply, I have been busy recently. I've updated
>> > > the
>> > > > >>>> design with more details about your questions, and here is a
>> > > summary:
>> > > > >>>>
>> > > > >>>> > 1. Would there be only one delete vector per data file?
>> > > > >>>> Yes. It's possible that we have multiple deletion vectors per very
>> > > > >>>> large data file to further reduce write amplification, but I'm not
>> > > sure if
>> > > > >>>> it's over design.
>> > > > >>>>
>> > > > >>>> > 2. Would this require merge of existing vectors and new deletes 
>> > > > >>>> > at
>> > > > >>>> write time?
>> > > > >>>> Yes. Merging two bitmaps would be quite efficient.
>> > > > >>>>
>> > > > >>>> > 3. How would the data file for a vector be identified?
>> > > > >>>> It will be stored in the manifest file. We will have one  entry 
>> > > > >>>> for
>> > > > >>>> deletion file, and we add an extra field `data_file_path` for the
>> > > > >>>> associated data file path. See Changes to spec
>> > > > >>>> <
>> > > https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit#heading=h.p4vrosjzl14j>
>> > > for
>> > > > >>>> details, and Write process
>> > > > >>>> <
>> > > https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit#heading=h.tft7a34rd2be>
>> > > for
>> > > > >>>> example.
>> > > > >>>>
>> > > > >>>> > 4. If multiple vectors are allowed, what is the plan for keeping
>> > > the
>> > > > >>>> number of delete vectors small?
>> > > > >>>> I see multiple vectors per data file as an optimization for very
>> > > large
>> > > > >>>> data file, and I'm not sure if it's over design.
>> > > > >>>>
>> > > > >>>> > 5. Would we allow writing multiple delete vectors into the same
>> > > file?
>> > > > >>>> I don't want to do that. Merging delete vectors into one file have
>> > > two
>> > > > >>>> concerns:
>> > > > >>>>
>> > > > >>>>    - Write amplification.
>> > > > >>>>    - It makes concurrent modification of data files difficult.
>> > > > >>>>
>> > > > >>>> > 6. How would we track which files are affected by a combined 
>> > > > >>>> > file
>> > > of
>> > > > >>>> delete vectors?
>> > > > >>>> Sorry, I don't quite get your point.
>> > > > >>>>
>> > > > >>>> > 7. What are the details of the proposed file format?
>> > > > >>>> I think roaring bitmap would be a good candidate, but other 
>> > > > >>>> columnar
>> > > > >>>> formats such as parquet, orc are also possible since they provided
>> > > great
>> > > > >>>> compression for boolean columns. I've mentioned it here
>> > > > >>>> <
>> > > https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit#heading=h.nrhcjanzai0v
>> > > >
>> > > > >>>>
>> > > > >>>> On Thu, Sep 21, 2023 at 4:53 PM Jean-Baptiste Onofré <
>> > > j...@nanthrax.net>
>> > > > >>>> wrote:
>> > > > >>>>
>> > > > >>>>> Hi Ryan,
>> > > > >>>>>
>> > > > >>>>> Thanks for the feedback. Unfortunately, I was not able to join 
>> > > > >>>>> the
>> > > > >>>>> Iceberg community sync meeting yesterday, I promise I will join 
>> > > > >>>>> the
>> > > > >>>>> next ones.
>> > > > >>>>>
>> > > > >>>>> I think the proposal is very interesting and also the
>> > > > >>>>> discussion/comments in the document. I agree that some points
>> > > should
>> > > > >>>>> be discussed further. I propose to update the document with your
>> > > > >>>>> points/questions.
>> > > > >>>>>
>> > > > >>>>> Thanks !
>> > > > >>>>>
>> > > > >>>>> Regards
>> > > > >>>>> JB
>> > > > >>>>>
>> > > > >>>>> On Thu, Sep 21, 2023 at 2:02 AM Ryan Blue <b...@tabular.io> 
>> > > > >>>>> wrote:
>> > > > >>>>> >
>> > > > >>>>> > Renjie, thanks for the proposal.
>> > > > >>>>> >
>> > > > >>>>> > We talked about this today in the Iceberg community sync and 
>> > > > >>>>> > the
>> > > > >>>>> general feedback was that we're excited work on this, but the
>> > > proposal left
>> > > > >>>>> a few areas unclear. There are a few decisions about how to 
>> > > > >>>>> manage
>> > > the
>> > > > >>>>> delete vectors that need to be added to the design. For example:
>> > > > >>>>> > 1. Would there be only one delete vector per data file?
>> > > > >>>>> > 2. Would this require merge of existing vectors and new deletes
>> > > at
>> > > > >>>>> write time?
>> > > > >>>>> > 3. How would the data file for a vector be identified?
>> > > > >>>>> > 4. If multiple vectors are allowed, what is the plan for 
>> > > > >>>>> > keeping
>> > > the
>> > > > >>>>> number of delete vectors small?
>> > > > >>>>> > 5. Would we allow writing multiple delete vectors into the same
>> > > file?
>> > > > >>>>> > 6. How would we track which files are affected by a combined
>> > > file of
>> > > > >>>>> delete vectors?
>> > > > >>>>> > 7. What are the details of the proposed file format?
>> > > > >>>>> >
>> > > > >>>>> > In short, we just want to better understand how all this would
>> > > work.
>> > > > >>>>> >
>> > > > >>>>> > Thanks!
>> > > > >>>>> >
>> > > > >>>>> > Ryan
>> > > > >>>>> >
>> > > > >>>>> >
>> > > > >>>>> > On Mon, Sep 18, 2023 at 8:22 PM Renjie Liu <
>> > > liurenjie2...@gmail.com>
>> > > > >>>>> wrote:
>> > > > >>>>> >>
>> > > > >>>>> >> Hi, all:
>> > > > >>>>> >>
>> > > > >>>>> >>
>> > > > >>>>> >>
>> > > > >>>>> >> I have a proposal to introduce deletion vector file to reduce
>> > > write
>> > > > >>>>> amplification of iceberg table:
>> > > > >>>>> >>
>> > > > >>>>> >>
>> > > > >>>>>
>> > > https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit?usp=sharing
>> > > > >>>>> >>
>> > > > >>>>> >>
>> > > > >>>>> >>
>> > > > >>>>> >> Welcome to comment, and looking forward to hear your advice.
>> > > > >>>>> >
>> > > > >>>>> >
>> > > > >>>>> >
>> > > > >>>>> > --
>> > > > >>>>> > Ryan Blue
>> > > > >>>>> > Tabular
>> > > > >>>>>
>> > > > >>>>
>> > > > >>>>
>> > > > >>>> --
>> > > > >>>> Renjie Liu
>> > > > >>>> Software Engineer, MVAD
>> > > > >>>>
>> > > > >>>
>> > > > >>>
>> > > > >>> --
>> > > > >>> Renjie Liu
>> > > > >>> Software Engineer, MVAD
>> > > > >>>
>> > > > >>
>> > > > >>
>> > > > >> --
>> > > > >> Ryan Blue
>> > > > >> Tabular
>> > > > >>
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Renjie Liu
>> > > > > Software Engineer, MVAD
>> > > > >
>> > > > >
>> > > >
>> > > > --
>> > > > Renjie Liu
>> > > > Software Engineer, MVAD
>> > > >
>> > >
>> >
>> >
>> > --
>> > Renjie Liu
>> > Software Engineer, MVAD
>> >
>
>
>
> --
> Renjie Liu
> Software Engineer, MVAD

Re: Proposal: Introduce deletion vector file to reduce write amplification

Reply via email to