I tried to summarize notes from our previous discussions here: https://docs.google.com/document/d/1M4L6o-qnGRwGhbhkW8BnravoTwvCrJV8VvzVQDRJO5I/
I am going to iterate on the doc later today. On 2023/10/11 07:06:07 Renjie Liu wrote: > Hi, Russell: > > > > The main things I’m still interested are alternative approaches. I think > > that some of the work that Anton is working on have shown some different > > bottlenecks in applying delete files that I’m not sure are addressed by > > this proposal. > > > I'm also interested. Could you share some resources on the work that Anton > is working? I didn't notice that. > > For example, this proposal suggests doing a 1 to 1 (or 1 rowgroup to 1) > > delete file application in order to speed up planning. But this could as be > > done with a puffin file indexing delete files to data files. This would > > eliminate any planning cost while also allowing us to do more complicated > > things like mapping multiple data files to a single delete file as well as > > operate on a one to many data file to delete file approach. Doing this > > approach would mean we would need to change any existing metadata or > > introduce a new separate file type. > > > Yes, we can improve planning performance by embedding the mapping in a > puffin file. But I guess this may introduce other problems like conflicting > when doing commits? IIUC, puffin file is used as table level index or > statistics. > > I would also expect some POC experiments showing that the Spec is getting > > the benefit’s that are hypothesized. > > > I will conduct some poc experiments with actual data, but it may take some > time to implement it. > > The proposal I think also needs to address any possible limitations with > > this approach. They don’t all need to be solved but we should at least > > being exploring them. As a quick example, how does using single delete > > files interact with our commit logic? I would guess that a single delete > > file approach would make it more difficult to perform multiple deletes > > concurrently? > > > Good suggestion, I'm working on updating the doc to completement sketches > for dml operations. IIUC, for potential conflicts in performing multiple > deletes concurrently, you mean concurrent writes from different dml jobs? > If so, I think the current solution still has the same problem since this > is in fact conflicts from concurrent updates. But I do admit that the > deletion vector approach makes confliction easier since it's file level. > > > > On Tue, Oct 10, 2023 at 8:54 AM Russell Spitzer <russell.spit...@gmail.com> > wrote: > > > The main things I’m still interested are alternative approaches. I think > > that some of the work that Anton is working on have shown some different > > bottlenecks in applying delete files that I’m not sure are addressed by > > this proposal. > > > > For example, this proposal suggests doing a 1 to 1 (or 1 rowgroup to 1) > > delete file application in order to speed up planning. But this could as be > > done with a puffin file indexing delete files to data files. This would > > eliminate any planning cost while also allowing us to do more complicated > > things like mapping multiple data files to a single delete file as well as > > operate on a one to many data file to delete file approach. Doing this > > approach would mean we would need to change any existing metadata or > > introduce a new separate file type. > > > > I think basically for every “benefit” outlined we should think about if > > there is an alternative approach that would achieve the same benefit. Then > > we should analyze or whether or not the proposal is the best solution for > > that particular benefit and do some work to calculate what that benefit > > would be and what drawbacks there might be. > > > > I would also expect some POC experiments showing that the Spec is getting > > the benefit’s that are hypothesized. > > > > The proposal I think also needs to address any possible limitations with > > this approach. They don’t all need to be solved but we should at least > > being exploring them. As a quick example, how does using single delete > > files interact with our commit logic? I would guess that a single delete > > file approach would make it more difficult to perform multiple deletes > > concurrently? > > > > Sent from my iPad > > > > On Oct 8, 2023, at 9:22 PM, Renjie Liu <liurenjie2...@gmail.com> wrote: > > > > > > Hi, Ryan: > > Thanks for your reply. > > > > 1. What is the exact file format for these on disk that you're proposing? > >> Even if you're saying that it is what is produced by roaring bitmap, we > >> need more information. Is that a portable format? Do you wrap it at all in > >> the file to carry extra metadata? For example, the proposal says that a > >> starting position for a bitmap would be used. Where is that stored? > > > > > > Sorry for the confusion, by file format I mean roaring bitmap's file > > format <https://github.com/RoaringBitmap/RoaringFormatSpec#general-layout>. > > I checked that it has been implemented in several languages, such as java, > > go, rust, c. Metadata will be stored in manifest file as other entries such > > as datafile, deletion file. The starting position doesn't need to be stored > > since it's used by the file reader. I think your suggestion to provide an > > interface in design will make things clearer, and I will add it to the > > design doc. > > > > 2. How would DML operations work? Just a sketch would be great. I don't > >> think it is a good idea to leave the implications for DML fuzzy. > > > > > > I'll add sketches for other DML operations. > > > > 3. The comparison appears to be between rewriting data files and using > >> delete vectors. I think it needs to compare the existing delete file > >> formats to delete vectors so that we know why there is a benefit to doing > >> this beyond using the current positional delete files. The doc states that > >> there aren't measurements here, which I think we need. Otherwise, should we > >> just have a version of DML that produces one position delete per data file? > > > > > > I think deletion vector files are quite similar to position delete files, > > e.g. you can think of a deletion vector file as one position delete per > > data file. But this change brings new chances for optimization, and there > > is one section talking about it in the design doc. As with the > > measurements, I'll try to design some experiments for it. > > > > 4. I think this is missing some justification for how you're changing data > >> file metadata. > > > > > > I agree with your comment that if we associate one deletion vector with a > > data file, maybe it's better to extend the DataFile struct rather than > > introducing new entries. > > > > I'll update the doc to address the comments. > > > > > > > > > > > > > > On Mon, Oct 9, 2023 at 1:44 AM Ryan Blue <b...@tabular.io> wrote: > > > >> Thanks, Renjie. I went through and made some comments about what is still > >> not clear. Here's a summary: > >> > >> 1. What is the exact file format for these on disk that you're proposing? > >> Even if you're saying that it is what is produced by roaring bitmap, we > >> need more information. Is that a portable format? Do you wrap it at all in > >> the file to carry extra metadata? For example, the proposal says that a > >> starting position for a bitmap would be used. Where is that stored? > >> 2. How would DML operations work? Just a sketch would be great. I don't > >> think it is a good idea to leave the implications for DML fuzzy. > >> 3. The comparison appears to be between rewriting data files and using > >> delete vectors. I think it needs to compare the existing delete file > >> formats to delete vectors so that we know why there is a benefit to doing > >> this beyond using the current positional delete files. The doc states that > >> there aren't measurements here, which I think we need. Otherwise, should we > >> just have a version of DML that produces one position delete per data file? > >> 4. I think this is missing some justification for how you're changing > >> data file metadata. > >> > >> On Sat, Oct 7, 2023 at 4:49 AM Renjie Liu <liurenjie2...@gmail.com> > >> wrote: > >> > >>> Hi: > >>> I have addressed most comments in the document. I would like to ask > >>> what's the next step? Should we have a vote on this spec to reject it or > >>> we > >>> should go on with it? > >>> > >>> On Sat, Sep 30, 2023 at 11:20 PM Renjie Liu <liurenjie2...@gmail.com> > >>> wrote: > >>> > >>>> Hi: > >>>> Sorry for the late reply, I have been busy recently. I've updated the > >>>> design with more details about your questions, and here is a summary: > >>>> > >>>> > 1. Would there be only one delete vector per data file? > >>>> Yes. It's possible that we have multiple deletion vectors per very > >>>> large data file to further reduce write amplification, but I'm not sure > >>>> if > >>>> it's over design. > >>>> > >>>> > 2. Would this require merge of existing vectors and new deletes at > >>>> write time? > >>>> Yes. Merging two bitmaps would be quite efficient. > >>>> > >>>> > 3. How would the data file for a vector be identified? > >>>> It will be stored in the manifest file. We will have one entry for > >>>> deletion file, and we add an extra field `data_file_path` for the > >>>> associated data file path. See Changes to spec > >>>> <https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit#heading=h.p4vrosjzl14j> > >>>> for > >>>> details, and Write process > >>>> <https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit#heading=h.tft7a34rd2be> > >>>> for > >>>> example. > >>>> > >>>> > 4. If multiple vectors are allowed, what is the plan for keeping the > >>>> number of delete vectors small? > >>>> I see multiple vectors per data file as an optimization for very large > >>>> data file, and I'm not sure if it's over design. > >>>> > >>>> > 5. Would we allow writing multiple delete vectors into the same file? > >>>> I don't want to do that. Merging delete vectors into one file have two > >>>> concerns: > >>>> > >>>> - Write amplification. > >>>> - It makes concurrent modification of data files difficult. > >>>> > >>>> > 6. How would we track which files are affected by a combined file of > >>>> delete vectors? > >>>> Sorry, I don't quite get your point. > >>>> > >>>> > 7. What are the details of the proposed file format? > >>>> I think roaring bitmap would be a good candidate, but other columnar > >>>> formats such as parquet, orc are also possible since they provided great > >>>> compression for boolean columns. I've mentioned it here > >>>> <https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit#heading=h.nrhcjanzai0v> > >>>> > >>>> On Thu, Sep 21, 2023 at 4:53 PM Jean-Baptiste Onofré <j...@nanthrax.net> > >>>> wrote: > >>>> > >>>>> Hi Ryan, > >>>>> > >>>>> Thanks for the feedback. Unfortunately, I was not able to join the > >>>>> Iceberg community sync meeting yesterday, I promise I will join the > >>>>> next ones. > >>>>> > >>>>> I think the proposal is very interesting and also the > >>>>> discussion/comments in the document. I agree that some points should > >>>>> be discussed further. I propose to update the document with your > >>>>> points/questions. > >>>>> > >>>>> Thanks ! > >>>>> > >>>>> Regards > >>>>> JB > >>>>> > >>>>> On Thu, Sep 21, 2023 at 2:02 AM Ryan Blue <b...@tabular.io> wrote: > >>>>> > > >>>>> > Renjie, thanks for the proposal. > >>>>> > > >>>>> > We talked about this today in the Iceberg community sync and the > >>>>> general feedback was that we're excited work on this, but the proposal > >>>>> left > >>>>> a few areas unclear. There are a few decisions about how to manage the > >>>>> delete vectors that need to be added to the design. For example: > >>>>> > 1. Would there be only one delete vector per data file? > >>>>> > 2. Would this require merge of existing vectors and new deletes at > >>>>> write time? > >>>>> > 3. How would the data file for a vector be identified? > >>>>> > 4. If multiple vectors are allowed, what is the plan for keeping the > >>>>> number of delete vectors small? > >>>>> > 5. Would we allow writing multiple delete vectors into the same file? > >>>>> > 6. How would we track which files are affected by a combined file of > >>>>> delete vectors? > >>>>> > 7. What are the details of the proposed file format? > >>>>> > > >>>>> > In short, we just want to better understand how all this would work. > >>>>> > > >>>>> > Thanks! > >>>>> > > >>>>> > Ryan > >>>>> > > >>>>> > > >>>>> > On Mon, Sep 18, 2023 at 8:22 PM Renjie Liu <liurenjie2...@gmail.com> > >>>>> wrote: > >>>>> >> > >>>>> >> Hi, all: > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> I have a proposal to introduce deletion vector file to reduce write > >>>>> amplification of iceberg table: > >>>>> >> > >>>>> >> > >>>>> https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit?usp=sharing > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> Welcome to comment, and looking forward to hear your advice. > >>>>> > > >>>>> > > >>>>> > > >>>>> > -- > >>>>> > Ryan Blue > >>>>> > Tabular > >>>>> > >>>> > >>>> > >>>> -- > >>>> Renjie Liu > >>>> Software Engineer, MVAD > >>>> > >>> > >>> > >>> -- > >>> Renjie Liu > >>> Software Engineer, MVAD > >>> > >> > >> > >> -- > >> Ryan Blue > >> Tabular > >> > > > > > > -- > > Renjie Liu > > Software Engineer, MVAD > > > > > > -- > Renjie Liu > Software Engineer, MVAD >