Re: Proposal: Introduce deletion vector file to reduce write amplification

Renjie Liu Wed, 11 Oct 2023 18:21:19 -0700

Hi, Anton:
I've gone through the doc, and we are trying to solve the same problems of
position deletes, but with different approaches. It's quite interesting.


On Thu, Oct 12, 2023 at 12:11 AM Anton Okolnychyi <[email protected]>
wrote:

> I tried to summarize notes from our previous discussions here:
>
> https://docs.google.com/document/d/1M4L6o-qnGRwGhbhkW8BnravoTwvCrJV8VvzVQDRJO5I/
>
> I am going to iterate on the doc later today.
>
> On 2023/10/11 07:06:07 Renjie Liu wrote:
> > Hi, Russell:
> >
> >
> > > The main things I’m still interested are alternative approaches. I
> think
> > > that some of the work that Anton is working on have shown some
> different
> > > bottlenecks in applying delete files that I’m not sure are addressed by
> > > this proposal.
> >
> >
> > I'm also interested. Could you share some resources on the work that
> Anton
> > is working? I didn't notice that.
> >
> > For example, this proposal suggests doing a 1 to 1 (or 1 rowgroup to 1)
> > > delete file application in order to speed up planning. But this could
> as be
> > > done with a puffin file indexing delete files to data files. This would
> > > eliminate any planning cost while also allowing us to do more
> complicated
> > > things like mapping multiple data files to a single delete file as
> well as
> > > operate on a one to many data file to delete file approach. Doing this
> > > approach would mean we would need to change any existing metadata or
> > > introduce a new separate file type.
> >
> >
> > Yes, we can improve planning performance by embedding the mapping in a
> > puffin file. But I guess this may introduce other problems like
> conflicting
> > when doing commits? IIUC, puffin file is used as table level index or
> > statistics.
> >
> > I would also expect some POC experiments showing that the Spec is getting
> > > the benefit’s that are hypothesized.
> >
> >
> >  I will conduct some poc experiments with actual data, but it may take
> some
> > time to implement it.
> >
> > The proposal I think also needs to address any possible limitations with
> > > this approach. They don’t all need to be solved but we should at least
> > > being exploring them. As a quick example, how does using single delete
> > > files interact with our commit logic? I would guess that a single
> delete
> > > file approach would make it more difficult to perform multiple deletes
> > > concurrently?
> >
> >
> > Good suggestion, I'm working on updating the doc to completement sketches
> > for dml operations. IIUC, for potential conflicts in performing multiple
> > deletes concurrently, you mean concurrent writes from different dml jobs?
> > If so, I think the current solution still has the same problem since this
> > is in fact conflicts from concurrent updates. But I do admit that the
> > deletion vector approach makes confliction easier since it's file level.
> >
> >
> >
> > On Tue, Oct 10, 2023 at 8:54 AM Russell Spitzer <
> [email protected]>
> > wrote:
> >
> > > The main things I’m still interested are alternative approaches. I
> think
> > > that some of the work that Anton is working on have shown some
> different
> > > bottlenecks in applying delete files that I’m not sure are addressed by
> > > this proposal.
> > >
> > > For example, this proposal suggests doing a 1 to 1 (or 1 rowgroup to 1)
> > > delete file application in order to speed up planning. But this could
> as be
> > > done with a puffin file indexing delete files to data files. This would
> > > eliminate any planning cost while also allowing us to do more
> complicated
> > > things like mapping multiple data files to a single delete file as
> well as
> > > operate on a one to many data file to delete file approach. Doing this
> > > approach would mean we would need to change any existing metadata or
> > > introduce a new separate file type.
> > >
> > > I think basically for every “benefit” outlined we should think about if
> > > there is an alternative approach that would achieve the same benefit.
> Then
> > > we should analyze or whether or not the proposal is the best solution
> for
> > > that particular benefit and do some work to calculate what that benefit
> > > would be and what drawbacks there might be.
> > >
> > > I would also expect some POC experiments showing that the Spec is
> getting
> > > the benefit’s that are hypothesized.
> > >
> > > The proposal I think also needs to address any possible limitations
> with
> > > this approach. They don’t all need to be solved but we should at least
> > > being exploring them. As a quick example, how does using single delete
> > > files interact with our commit logic? I would guess that a single
> delete
> > > file approach would make it more difficult to perform multiple deletes
> > > concurrently?
> > >
> > > Sent from my iPad
> > >
> > > On Oct 8, 2023, at 9:22 PM, Renjie Liu <[email protected]>
> wrote:
> > >
> > > 
> > > Hi, Ryan:
> > > Thanks for your reply.
> > >
> > > 1. What is the exact file format for these on disk that you're
> proposing?
> > >> Even if you're saying that it is what is produced by roaring bitmap,
> we
> > >> need more information. Is that a portable format? Do you wrap it at
> all in
> > >> the file to carry extra metadata? For example, the proposal says that
> a
> > >> starting position for a bitmap would be used. Where is that stored?
> > >
> > >
> > > Sorry for the confusion, by file format I mean roaring bitmap's file
> > > format <
> https://github.com/RoaringBitmap/RoaringFormatSpec#general-layout>.
> > > I checked that it has been implemented in several languages, such as
> java,
> > > go, rust, c. Metadata will be stored in manifest file as other entries
> such
> > > as datafile, deletion file. The starting position doesn't need to be
> stored
> > > since it's used by the file reader. I think your suggestion to provide
> an
> > > interface in design will make things clearer, and I will add it to the
> > > design doc.
> > >
> > > 2. How would DML operations work? Just a sketch would be great. I don't
> > >> think it is a good idea to leave the implications for DML fuzzy.
> > >
> > >
> > > I'll add sketches for other DML operations.
> > >
> > > 3. The comparison appears to be between rewriting data files and using
> > >> delete vectors. I think it needs to compare the existing delete file
> > >> formats to delete vectors so that we know why there is a benefit to
> doing
> > >> this beyond using the current positional delete files. The doc states
> that
> > >> there aren't measurements here, which I think we need. Otherwise,
> should we
> > >> just have a version of DML that produces one position delete per data
> file?
> > >
> > >
> > > I think deletion vector files are quite similar to position delete
> files,
> > > e.g. you can think of a deletion vector file as one position delete per
> > > data file. But this change brings new chances for optimization, and
> there
> > > is one section talking about it in the design doc. As with the
> > > measurements, I'll try to design some experiments for it.
> > >
> > > 4. I think this is missing some justification for how you're changing
> data
> > >> file metadata.
> > >
> > >
> > > I agree with your comment that if we associate one deletion vector
> with a
> > > data file, maybe it's better to extend the DataFile struct rather than
> > > introducing new entries.
> > >
> > > I'll update the doc to address the comments.
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Oct 9, 2023 at 1:44 AM Ryan Blue <[email protected]> wrote:
> > >
> > >> Thanks, Renjie. I went through and made some comments about what is
> still
> > >> not clear. Here's a summary:
> > >>
> > >> 1. What is the exact file format for these on disk that you're
> proposing?
> > >> Even if you're saying that it is what is produced by roaring bitmap,
> we
> > >> need more information. Is that a portable format? Do you wrap it at
> all in
> > >> the file to carry extra metadata? For example, the proposal says that
> a
> > >> starting position for a bitmap would be used. Where is that stored?
> > >> 2. How would DML operations work? Just a sketch would be great. I
> don't
> > >> think it is a good idea to leave the implications for DML fuzzy.
> > >> 3. The comparison appears to be between rewriting data files and using
> > >> delete vectors. I think it needs to compare the existing delete file
> > >> formats to delete vectors so that we know why there is a benefit to
> doing
> > >> this beyond using the current positional delete files. The doc states
> that
> > >> there aren't measurements here, which I think we need. Otherwise,
> should we
> > >> just have a version of DML that produces one position delete per data
> file?
> > >> 4. I think this is missing some justification for how you're changing
> > >> data file metadata.
> > >>
> > >> On Sat, Oct 7, 2023 at 4:49 AM Renjie Liu <[email protected]>
> > >> wrote:
> > >>
> > >>> Hi:
> > >>> I have addressed most comments in the document. I would like to ask
> > >>> what's the next step? Should we have a vote on this spec to reject
> it or we
> > >>> should go on with it?
> > >>>
> > >>> On Sat, Sep 30, 2023 at 11:20 PM Renjie Liu <[email protected]
> >
> > >>> wrote:
> > >>>
> > >>>> Hi:
> > >>>> Sorry for the late reply, I have been busy recently. I've updated
> the
> > >>>> design with more details about your questions, and here is a
> summary:
> > >>>>
> > >>>> > 1. Would there be only one delete vector per data file?
> > >>>> Yes. It's possible that we have multiple deletion vectors per very
> > >>>> large data file to further reduce write amplification, but I'm not
> sure if
> > >>>> it's over design.
> > >>>>
> > >>>> > 2. Would this require merge of existing vectors and new deletes at
> > >>>> write time?
> > >>>> Yes. Merging two bitmaps would be quite efficient.
> > >>>>
> > >>>> > 3. How would the data file for a vector be identified?
> > >>>> It will be stored in the manifest file. We will have one  entry for
> > >>>> deletion file, and we add an extra field `data_file_path` for the
> > >>>> associated data file path. See Changes to spec
> > >>>> <
> https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit#heading=h.p4vrosjzl14j>
> for
> > >>>> details, and Write process
> > >>>> <
> https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit#heading=h.tft7a34rd2be>
> for
> > >>>> example.
> > >>>>
> > >>>> > 4. If multiple vectors are allowed, what is the plan for keeping
> the
> > >>>> number of delete vectors small?
> > >>>> I see multiple vectors per data file as an optimization for very
> large
> > >>>> data file, and I'm not sure if it's over design.
> > >>>>
> > >>>> > 5. Would we allow writing multiple delete vectors into the same
> file?
> > >>>> I don't want to do that. Merging delete vectors into one file have
> two
> > >>>> concerns:
> > >>>>
> > >>>>    - Write amplification.
> > >>>>    - It makes concurrent modification of data files difficult.
> > >>>>
> > >>>> > 6. How would we track which files are affected by a combined file
> of
> > >>>> delete vectors?
> > >>>> Sorry, I don't quite get your point.
> > >>>>
> > >>>> > 7. What are the details of the proposed file format?
> > >>>> I think roaring bitmap would be a good candidate, but other columnar
> > >>>> formats such as parquet, orc are also possible since they provided
> great
> > >>>> compression for boolean columns. I've mentioned it here
> > >>>> <
> https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit#heading=h.nrhcjanzai0v
> >
> > >>>>
> > >>>> On Thu, Sep 21, 2023 at 4:53 PM Jean-Baptiste Onofré <
> [email protected]>
> > >>>> wrote:
> > >>>>
> > >>>>> Hi Ryan,
> > >>>>>
> > >>>>> Thanks for the feedback. Unfortunately, I was not able to join the
> > >>>>> Iceberg community sync meeting yesterday, I promise I will join the
> > >>>>> next ones.
> > >>>>>
> > >>>>> I think the proposal is very interesting and also the
> > >>>>> discussion/comments in the document. I agree that some points
> should
> > >>>>> be discussed further. I propose to update the document with your
> > >>>>> points/questions.
> > >>>>>
> > >>>>> Thanks !
> > >>>>>
> > >>>>> Regards
> > >>>>> JB
> > >>>>>
> > >>>>> On Thu, Sep 21, 2023 at 2:02 AM Ryan Blue <[email protected]> wrote:
> > >>>>> >
> > >>>>> > Renjie, thanks for the proposal.
> > >>>>> >
> > >>>>> > We talked about this today in the Iceberg community sync and the
> > >>>>> general feedback was that we're excited work on this, but the
> proposal left
> > >>>>> a few areas unclear. There are a few decisions about how to manage
> the
> > >>>>> delete vectors that need to be added to the design. For example:
> > >>>>> > 1. Would there be only one delete vector per data file?
> > >>>>> > 2. Would this require merge of existing vectors and new deletes
> at
> > >>>>> write time?
> > >>>>> > 3. How would the data file for a vector be identified?
> > >>>>> > 4. If multiple vectors are allowed, what is the plan for keeping
> the
> > >>>>> number of delete vectors small?
> > >>>>> > 5. Would we allow writing multiple delete vectors into the same
> file?
> > >>>>> > 6. How would we track which files are affected by a combined
> file of
> > >>>>> delete vectors?
> > >>>>> > 7. What are the details of the proposed file format?
> > >>>>> >
> > >>>>> > In short, we just want to better understand how all this would
> work.
> > >>>>> >
> > >>>>> > Thanks!
> > >>>>> >
> > >>>>> > Ryan
> > >>>>> >
> > >>>>> >
> > >>>>> > On Mon, Sep 18, 2023 at 8:22 PM Renjie Liu <
> [email protected]>
> > >>>>> wrote:
> > >>>>> >>
> > >>>>> >> Hi, all:
> > >>>>> >>
> > >>>>> >>
> > >>>>> >>
> > >>>>> >> I have a proposal to introduce deletion vector file to reduce
> write
> > >>>>> amplification of iceberg table:
> > >>>>> >>
> > >>>>> >>
> > >>>>>
> https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit?usp=sharing
> > >>>>> >>
> > >>>>> >>
> > >>>>> >>
> > >>>>> >> Welcome to comment, and looking forward to hear your advice.
> > >>>>> >
> > >>>>> >
> > >>>>> >
> > >>>>> > --
> > >>>>> > Ryan Blue
> > >>>>> > Tabular
> > >>>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Renjie Liu
> > >>>> Software Engineer, MVAD
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Renjie Liu
> > >>> Software Engineer, MVAD
> > >>>
> > >>
> > >>
> > >> --
> > >> Ryan Blue
> > >> Tabular
> > >>
> > >
> > >
> > > --
> > > Renjie Liu
> > > Software Engineer, MVAD
> > >
> > >
> >
> > --
> > Renjie Liu
> > Software Engineer, MVAD
> >
>


-- 
Renjie Liu
Software Engineer, MVAD

Re: Proposal: Introduce deletion vector file to reduce write amplification

Reply via email to