Re: Proposal: Introduce deletion vector file to reduce write amplification

Anton Okolnychyi Wed, 11 Oct 2023 09:11:15 -0700

I tried to summarize notes from our previous discussions here:
https://docs.google.com/document/d/1M4L6o-qnGRwGhbhkW8BnravoTwvCrJV8VvzVQDRJO5I/


I am going to iterate on the doc later today.

On 2023/10/11 07:06:07 Renjie Liu wrote:
> Hi, Russell:
> 
> 
> > The main things I’m still interested are alternative approaches. I think
> > that some of the work that Anton is working on have shown some different
> > bottlenecks in applying delete files that I’m not sure are addressed by
> > this proposal.
> 
> 
> I'm also interested. Could you share some resources on the work that Anton
> is working? I didn't notice that.
> 
> For example, this proposal suggests doing a 1 to 1 (or 1 rowgroup to 1)
> > delete file application in order to speed up planning. But this could as be
> > done with a puffin file indexing delete files to data files. This would
> > eliminate any planning cost while also allowing us to do more complicated
> > things like mapping multiple data files to a single delete file as well as
> > operate on a one to many data file to delete file approach. Doing this
> > approach would mean we would need to change any existing metadata or
> > introduce a new separate file type.
> 
> 
> Yes, we can improve planning performance by embedding the mapping in a
> puffin file. But I guess this may introduce other problems like conflicting
> when doing commits? IIUC, puffin file is used as table level index or
> statistics.
> 
> I would also expect some POC experiments showing that the Spec is getting
> > the benefit’s that are hypothesized.
> 
> 
>  I will conduct some poc experiments with actual data, but it may take some
> time to implement it.
> 
> The proposal I think also needs to address any possible limitations with
> > this approach. They don’t all need to be solved but we should at least
> > being exploring them. As a quick example, how does using single delete
> > files interact with our commit logic? I would guess that a single delete
> > file approach would make it more difficult to perform multiple deletes
> > concurrently?
> 
> 
> Good suggestion, I'm working on updating the doc to completement sketches
> for dml operations. IIUC, for potential conflicts in performing multiple
> deletes concurrently, you mean concurrent writes from different dml jobs?
> If so, I think the current solution still has the same problem since this
> is in fact conflicts from concurrent updates. But I do admit that the
> deletion vector approach makes confliction easier since it's file level.
> 
> 
> 
> On Tue, Oct 10, 2023 at 8:54 AM Russell Spitzer <[email protected]>
> wrote:
> 
> > The main things I’m still interested are alternative approaches. I think
> > that some of the work that Anton is working on have shown some different
> > bottlenecks in applying delete files that I’m not sure are addressed by
> > this proposal.
> >
> > For example, this proposal suggests doing a 1 to 1 (or 1 rowgroup to 1)
> > delete file application in order to speed up planning. But this could as be
> > done with a puffin file indexing delete files to data files. This would
> > eliminate any planning cost while also allowing us to do more complicated
> > things like mapping multiple data files to a single delete file as well as
> > operate on a one to many data file to delete file approach. Doing this
> > approach would mean we would need to change any existing metadata or
> > introduce a new separate file type.
> >
> > I think basically for every “benefit” outlined we should think about if
> > there is an alternative approach that would achieve the same benefit. Then
> > we should analyze or whether or not the proposal is the best solution for
> > that particular benefit and do some work to calculate what that benefit
> > would be and what drawbacks there might be.
> >
> > I would also expect some POC experiments showing that the Spec is getting
> > the benefit’s that are hypothesized.
> >
> > The proposal I think also needs to address any possible limitations with
> > this approach. They don’t all need to be solved but we should at least
> > being exploring them. As a quick example, how does using single delete
> > files interact with our commit logic? I would guess that a single delete
> > file approach would make it more difficult to perform multiple deletes
> > concurrently?
> >
> > Sent from my iPad
> >
> > On Oct 8, 2023, at 9:22 PM, Renjie Liu <[email protected]> wrote:
> >
> > 
> > Hi, Ryan:
> > Thanks for your reply.
> >
> > 1. What is the exact file format for these on disk that you're proposing?
> >> Even if you're saying that it is what is produced by roaring bitmap, we
> >> need more information. Is that a portable format? Do you wrap it at all in
> >> the file to carry extra metadata? For example, the proposal says that a
> >> starting position for a bitmap would be used. Where is that stored?
> >
> >
> > Sorry for the confusion, by file format I mean roaring bitmap's file
> > format <https://github.com/RoaringBitmap/RoaringFormatSpec#general-layout>.
> > I checked that it has been implemented in several languages, such as java,
> > go, rust, c. Metadata will be stored in manifest file as other entries such
> > as datafile, deletion file. The starting position doesn't need to be stored
> > since it's used by the file reader. I think your suggestion to provide an
> > interface in design will make things clearer, and I will add it to the
> > design doc.
> >
> > 2. How would DML operations work? Just a sketch would be great. I don't
> >> think it is a good idea to leave the implications for DML fuzzy.
> >
> >
> > I'll add sketches for other DML operations.
> >
> > 3. The comparison appears to be between rewriting data files and using
> >> delete vectors. I think it needs to compare the existing delete file
> >> formats to delete vectors so that we know why there is a benefit to doing
> >> this beyond using the current positional delete files. The doc states that
> >> there aren't measurements here, which I think we need. Otherwise, should we
> >> just have a version of DML that produces one position delete per data file?
> >
> >
> > I think deletion vector files are quite similar to position delete files,
> > e.g. you can think of a deletion vector file as one position delete per
> > data file. But this change brings new chances for optimization, and there
> > is one section talking about it in the design doc. As with the
> > measurements, I'll try to design some experiments for it.
> >
> > 4. I think this is missing some justification for how you're changing data
> >> file metadata.
> >
> >
> > I agree with your comment that if we associate one deletion vector with a
> > data file, maybe it's better to extend the DataFile struct rather than
> > introducing new entries.
> >
> > I'll update the doc to address the comments.
> >
> >
> >
> >
> >
> >
> > On Mon, Oct 9, 2023 at 1:44 AM Ryan Blue <[email protected]> wrote:
> >
> >> Thanks, Renjie. I went through and made some comments about what is still
> >> not clear. Here's a summary:
> >>
> >> 1. What is the exact file format for these on disk that you're proposing?
> >> Even if you're saying that it is what is produced by roaring bitmap, we
> >> need more information. Is that a portable format? Do you wrap it at all in
> >> the file to carry extra metadata? For example, the proposal says that a
> >> starting position for a bitmap would be used. Where is that stored?
> >> 2. How would DML operations work? Just a sketch would be great. I don't
> >> think it is a good idea to leave the implications for DML fuzzy.
> >> 3. The comparison appears to be between rewriting data files and using
> >> delete vectors. I think it needs to compare the existing delete file
> >> formats to delete vectors so that we know why there is a benefit to doing
> >> this beyond using the current positional delete files. The doc states that
> >> there aren't measurements here, which I think we need. Otherwise, should we
> >> just have a version of DML that produces one position delete per data file?
> >> 4. I think this is missing some justification for how you're changing
> >> data file metadata.
> >>
> >> On Sat, Oct 7, 2023 at 4:49 AM Renjie Liu <[email protected]>
> >> wrote:
> >>
> >>> Hi:
> >>> I have addressed most comments in the document. I would like to ask
> >>> what's the next step? Should we have a vote on this spec to reject it or 
> >>> we
> >>> should go on with it?
> >>>
> >>> On Sat, Sep 30, 2023 at 11:20 PM Renjie Liu <[email protected]>
> >>> wrote:
> >>>
> >>>> Hi:
> >>>> Sorry for the late reply, I have been busy recently. I've updated the
> >>>> design with more details about your questions, and here is a summary:
> >>>>
> >>>> > 1. Would there be only one delete vector per data file?
> >>>> Yes. It's possible that we have multiple deletion vectors per very
> >>>> large data file to further reduce write amplification, but I'm not sure 
> >>>> if
> >>>> it's over design.
> >>>>
> >>>> > 2. Would this require merge of existing vectors and new deletes at
> >>>> write time?
> >>>> Yes. Merging two bitmaps would be quite efficient.
> >>>>
> >>>> > 3. How would the data file for a vector be identified?
> >>>> It will be stored in the manifest file. We will have one  entry for
> >>>> deletion file, and we add an extra field `data_file_path` for the
> >>>> associated data file path. See Changes to spec
> >>>> <https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit#heading=h.p4vrosjzl14j>
> >>>>  for
> >>>> details, and Write process
> >>>> <https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit#heading=h.tft7a34rd2be>
> >>>>  for
> >>>> example.
> >>>>
> >>>> > 4. If multiple vectors are allowed, what is the plan for keeping the
> >>>> number of delete vectors small?
> >>>> I see multiple vectors per data file as an optimization for very large
> >>>> data file, and I'm not sure if it's over design.
> >>>>
> >>>> > 5. Would we allow writing multiple delete vectors into the same file?
> >>>> I don't want to do that. Merging delete vectors into one file have two
> >>>> concerns:
> >>>>
> >>>>    - Write amplification.
> >>>>    - It makes concurrent modification of data files difficult.
> >>>>
> >>>> > 6. How would we track which files are affected by a combined file of
> >>>> delete vectors?
> >>>> Sorry, I don't quite get your point.
> >>>>
> >>>> > 7. What are the details of the proposed file format?
> >>>> I think roaring bitmap would be a good candidate, but other columnar
> >>>> formats such as parquet, orc are also possible since they provided great
> >>>> compression for boolean columns. I've mentioned it here
> >>>> <https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit#heading=h.nrhcjanzai0v>
> >>>>
> >>>> On Thu, Sep 21, 2023 at 4:53 PM Jean-Baptiste Onofré <[email protected]>
> >>>> wrote:
> >>>>
> >>>>> Hi Ryan,
> >>>>>
> >>>>> Thanks for the feedback. Unfortunately, I was not able to join the
> >>>>> Iceberg community sync meeting yesterday, I promise I will join the
> >>>>> next ones.
> >>>>>
> >>>>> I think the proposal is very interesting and also the
> >>>>> discussion/comments in the document. I agree that some points should
> >>>>> be discussed further. I propose to update the document with your
> >>>>> points/questions.
> >>>>>
> >>>>> Thanks !
> >>>>>
> >>>>> Regards
> >>>>> JB
> >>>>>
> >>>>> On Thu, Sep 21, 2023 at 2:02 AM Ryan Blue <[email protected]> wrote:
> >>>>> >
> >>>>> > Renjie, thanks for the proposal.
> >>>>> >
> >>>>> > We talked about this today in the Iceberg community sync and the
> >>>>> general feedback was that we're excited work on this, but the proposal 
> >>>>> left
> >>>>> a few areas unclear. There are a few decisions about how to manage the
> >>>>> delete vectors that need to be added to the design. For example:
> >>>>> > 1. Would there be only one delete vector per data file?
> >>>>> > 2. Would this require merge of existing vectors and new deletes at
> >>>>> write time?
> >>>>> > 3. How would the data file for a vector be identified?
> >>>>> > 4. If multiple vectors are allowed, what is the plan for keeping the
> >>>>> number of delete vectors small?
> >>>>> > 5. Would we allow writing multiple delete vectors into the same file?
> >>>>> > 6. How would we track which files are affected by a combined file of
> >>>>> delete vectors?
> >>>>> > 7. What are the details of the proposed file format?
> >>>>> >
> >>>>> > In short, we just want to better understand how all this would work.
> >>>>> >
> >>>>> > Thanks!
> >>>>> >
> >>>>> > Ryan
> >>>>> >
> >>>>> >
> >>>>> > On Mon, Sep 18, 2023 at 8:22 PM Renjie Liu <[email protected]>
> >>>>> wrote:
> >>>>> >>
> >>>>> >> Hi, all:
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>> >> I have a proposal to introduce deletion vector file to reduce write
> >>>>> amplification of iceberg table:
> >>>>> >>
> >>>>> >>
> >>>>> https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit?usp=sharing
> >>>>> >>
> >>>>> >>
> >>>>> >>
> >>>>> >> Welcome to comment, and looking forward to hear your advice.
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > --
> >>>>> > Ryan Blue
> >>>>> > Tabular
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Renjie Liu
> >>>> Software Engineer, MVAD
> >>>>
> >>>
> >>>
> >>> --
> >>> Renjie Liu
> >>> Software Engineer, MVAD
> >>>
> >>
> >>
> >> --
> >> Ryan Blue
> >> Tabular
> >>
> >
> >
> > --
> > Renjie Liu
> > Software Engineer, MVAD
> >
> >
> 
> -- 
> Renjie Liu
> Software Engineer, MVAD
>

Re: Proposal: Introduce deletion vector file to reduce write amplification

Reply via email to