Re: Proposal for RESTful Data Operations

2024-04-23 Thread Drew
Hey Everyone, Reviving this thread to provide a quick update on the fine-grained metadata append commits. Since we landed the ContentFiles spec, we can now return to the append logic. Following our discussions above and the feedback received on the content files PR: #9717

Re: Proposal for RESTful Data Operations

2024-02-28 Thread Ryan Blue
I’m not sure that there is a single tenant to follow, but I can outline how I think about the REST protocol. The problem that the REST API solves is to standardize catalog interaction for Iceberg. I think that relies on being both a good standard and a good API. A good standard is small, specific,

Re: Proposal for RESTful Data Operations

2024-02-26 Thread Jack Ye
> I don’t see how extending the REST protocol like this would make an impact on that problem. I realize maybe we should take a step back, and first align on the position of the REST protocol, before tapping further into what could be improved with delete files and CDC. Specifically, *what is the t

Re: Proposal for RESTful Data Operations

2024-02-21 Thread Ryan Blue
Okay, so it sounds like the motivation is to improve the story around CDC. That’s a good area to work on, but I don’t see how extending the REST protocol like this would make an impact on that problem. In addition, I’m not following your rationale for a few things, so we should probably take a look

Re: Proposal for RESTful Data Operations

2024-02-21 Thread Jack Ye
Thanks for the response Ryan! > The solution to the problem above is to add more to the API — maybe have a single endpoint that can delete and append files in a single commit. But then pushing this to the server requires that we also support validations to ensure the swap is valid when there are r

Re: Proposal for RESTful Data Operations

2024-02-21 Thread Ryan Blue
Thanks for pushing this forward, Drew and Jack! Jack just asked “how would such endpoints work with multi-table transactions?” — that demonstrates a big concern that I have about adding remove or delete file append endpoints. I don’t think that those endpoints can or should be used for transaction

Re: Proposal for RESTful Data Operations

2024-02-20 Thread Jack Ye
I think there is also a point we were discussing but never closed regarding AppendDeleteFiles, if that should be supported. The recent development in Kafka, and vendor products like Upsolver Zero-ETL

Re: Proposal for RESTful Data Operations

2024-02-20 Thread Drew
Hi everyone, As we are discussing the rest spec changes to add support for DataFiles and DeleteFiles for both appends and scan planning API (PR: https://github.com/apache/iceberg/pull/9717). One thing that came up for appends was that this logic shouldn’t be in the table update API but instead it

Re: Proposal for RESTful Data Operations

2024-01-26 Thread Drew
Hey everyone, I wanted to provide a quick update on the progress of the commit API proposal. Based on the feedback in the design doc and the Slack conversation with Dan and Jack, we've reached an agreement that this is more of a fine-grained metadata commit, rather than a data operation or commit.

Re: Proposal for RESTful Data Operations

2024-01-12 Thread Drew
Hi everyone, I hope you all had great holidays! I wanted to resurface this proposal for RESTful Data operations. Currently, I have a open PR here: https://github.com/apache/iceberg/pull/9292 Thanks, Drew On Wed, Dec 13, 2023 at 3:04 PM Jack Ye wrote: > Thanks Drew for the quick turnaround, I

Re: Proposal for RESTful Data Operations

2023-12-13 Thread Jack Ye
Thanks Drew for the quick turnaround, I will take a deeper look into the PR. I think if we all agree that it is beneficial to have the AppendFIles(DataFile[]) API (maybe we should call it AppendRows instead), I would like to know if it also makes sense to have: 1. DeleteRows(DeleteFile[]), which c

Re: Proposal for RESTful Data Operations

2023-12-13 Thread Drew
Hi Ryan, Thanks for the feedback, I'll start going through the comments left in the doc! You're right in pointing out that the logic here can be simplified to roll back a commit. For now I introduced a smaller PR, that focuses on the append files operation. Github PR: https://github.com/apache/ic

Re: Proposal for RESTful Data Operations

2023-12-11 Thread Ryan Blue
> Based on my understanding of the proposal, I think it's more about the possibility of enabling other ways that do not require a full rollback. it's just currently we implemented it as a rollback to prove the feasibility. My main question is this: what can be done besides rolling back a commit? A

Re: Proposal for RESTful Data Operations

2023-12-11 Thread Jack Ye
> The proposal is to roll back rewrite commits, but that's already possible with the much simpler API that exists today. Based on my understanding of the proposal, I think it's more about the possibility of enabling other ways that do not require a full rollback. it's just currently we implemented

Re: Proposal for RESTful Data Operations

2023-12-08 Thread Ryan Blue
Thanks, Drew. I think it's a good idea in general to be able to perform commits on the server-side, but I would much rather break this down into smaller parts. I would definitely want to start with just file append use cases, since I think that is the biggest win. It can reduce retries and is an e

RE: Proposal for RESTful Data Operations

2023-12-08 Thread Gallardo, Drew
In regards to the multiple emails sent earlier, please use this one for discussions. Thanks you! On 2023/12/07 00:47:42 Drew wrote: > Hi everyone, > > My name is Drew Gallardo, and I’m a part of the Iceberg team at Amazon EMR > and Athena. I’m reaching out to share a proposal that introduces d