Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-10-27 Thread Renjie Liu
Hi, Anton: I've gone through the doc, and the Puffin Position Delete Files section shares some similarity with the deletion vector approach. Is there any conclusion about the discussion? On Thu, Oct 12, 2023 at 12:11 AM Anton Okolnychyi wrote: > I tried to summarize notes from our previous discu

Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-10-13 Thread Jean-Baptiste Onofré
That's a fair point and agree on that. I think having some kind of performance numbers and comparison would be helpful (depending of the use cases). Regards JB On Fri, Oct 13, 2023 at 7:27 AM Renjie Liu wrote: >> >> I'd say we should equally consider all incoming ideas at the moment and see >>

Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-10-12 Thread Renjie Liu
> > I'd say we should equally consider all incoming ideas at the moment and > see what would work best. We haven't agreed on anything yet, so I'd > encourage everyone to participate in the discussion. Can't agree more. I think we share the same goal to improve performance of iceberg, and welcome

Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-10-12 Thread Anton Okolnychyi
I just realized I did not open comments, fixed now. Any feedback or alternative ideas are more than welcome! I'd say we should equally consider all incoming ideas at the moment and see what would work best. We haven't agreed on anything yet, so I'd encourage everyone to participate in the discu

Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-10-11 Thread Renjie Liu
Hi, Anton: I've gone through the doc, and we are trying to solve the same problems of position deletes, but with different approaches. It's quite interesting. On Thu, Oct 12, 2023 at 12:11 AM Anton Okolnychyi wrote: > I tried to summarize notes from our previous discussions here: > > https://doc

Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-10-11 Thread Anton Okolnychyi
I tried to summarize notes from our previous discussions here: https://docs.google.com/document/d/1M4L6o-qnGRwGhbhkW8BnravoTwvCrJV8VvzVQDRJO5I/ I am going to iterate on the doc later today. On 2023/10/11 07:06:07 Renjie Liu wrote: > Hi, Russell: > > > > The main things I’m still interested are

Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-10-11 Thread Renjie Liu
Hi, Russell: > The main things I’m still interested are alternative approaches. I think > that some of the work that Anton is working on have shown some different > bottlenecks in applying delete files that I’m not sure are addressed by > this proposal. I'm also interested. Could you share some

Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-10-09 Thread Russell Spitzer
The main things I’m still interested are alternative approaches. I think that some of the work that Anton is working on have shown some different bottlenecks in applying delete files that I’m not sure are addressed by this proposal.For example, this proposal suggests doing a 1 to 1 (or 1 rowgroup t

Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-10-08 Thread Renjie Liu
Hi, Ryan: Thanks for your reply. 1. What is the exact file format for these on disk that you're proposing? > Even if you're saying that it is what is produced by roaring bitmap, we > need more information. Is that a portable format? Do you wrap it at all in > the file to carry extra metadata? For

Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-10-08 Thread Ryan Blue
Thanks, Renjie. I went through and made some comments about what is still not clear. Here's a summary: 1. What is the exact file format for these on disk that you're proposing? Even if you're saying that it is what is produced by roaring bitmap, we need more information. Is that a portable format?

Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-10-07 Thread Renjie Liu
Hi: I have addressed most comments in the document. I would like to ask what's the next step? Should we have a vote on this spec to reject it or we should go on with it? On Sat, Sep 30, 2023 at 11:20 PM Renjie Liu wrote: > Hi: > Sorry for the late reply, I have been busy recently. I've updated t

Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-09-30 Thread Renjie Liu
Hi: Sorry for the late reply, I have been busy recently. I've updated the design with more details about your questions, and here is a summary: > 1. Would there be only one delete vector per data file? Yes. It's possible that we have multiple deletion vectors per very large data file to further re

Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-09-21 Thread Jean-Baptiste Onofré
Hi Ryan, Thanks for the feedback. Unfortunately, I was not able to join the Iceberg community sync meeting yesterday, I promise I will join the next ones. I think the proposal is very interesting and also the discussion/comments in the document. I agree that some points should be discussed furthe

Re: Proposal: Introduce deletion vector file to reduce write amplification

2023-09-20 Thread Ryan Blue
Renjie, thanks for the proposal. We talked about this today in the Iceberg community sync and the general feedback was that we're excited work on this, but the proposal left a few areas unclear. There are a few decisions about how to manage the delete vectors that need to be added to the design. F

Proposal: Introduce deletion vector file to reduce write amplification

2023-09-18 Thread Renjie Liu
Hi, all: I have a proposal to introduce deletion vector file to reduce write amplification of iceberg table: https://docs.google.com/document/d/1FtPI0TUzMrPAFfWX_CA9NL6m6O1uNSxlpDsR-7xpPL0/edit?usp=sharing Welcome to comment, and looking forward to hear your advice.