> Based on my understanding of the proposal, I think it's more about the possibility of enabling other ways that do not require a full rollback. it's just currently we implemented it as a rollback to prove the feasibility.
My main question is this: what can be done besides rolling back a commit? And why does that require 5 extra routes and metadata writes from the REST service? On Mon, Dec 11, 2023 at 11:27 AM Jack Ye <yezhao...@gmail.com> wrote: > > The proposal is to roll back rewrite commits, but that's already > possible with the much simpler API that exists today. > > Based on my understanding of the proposal, I think it's more about the > possibility of enabling other ways that do not require a full rollback. > it's just currently we implemented it as a rollback to prove the > feasibility. But given that now we have full access to the changes of each > data commit (compared to only the post-change snapshot), we could > potentially reuse some files that have been rewritten. > > > I'm skeptical that there is a benefit to implementing the set of data > operations from the Java API > > +1, the current Java API might be a bit redundant, some APIs serve very > similar purposes. I feel the important data actions to have from the end > user's perspective are basically the ability to (1) AddRows, (2) > DeleteRows? > > -Jack > > On Fri, Dec 8, 2023 at 5:01 PM Ryan Blue <b...@tabular.io> wrote: > >> Thanks, Drew. >> >> I think it's a good idea in general to be able to perform commits on the >> server-side, but I would much rather break this down into smaller parts. I >> would definitely want to start with just file append use cases, since I >> think that is the biggest win. It can reduce retries and is an easy way to >> write from non-JVM languages or just simpler applications. >> >> I'm skeptical that there is a benefit to implementing the set of data >> operations from the Java API. That's primarily because I don't think that >> use case 1 (better conflict resolution) is actually achieved. You can avoid >> retries on the client, but the retries must happen _somewhere_. The >> proposal is to roll back rewrite commits, but that's already possible with >> the much simpler API that exists today. Maybe I'm missing something? >> >> Even if I'm mistaken about being able to improve conflict resolution, I >> think that there is quite a bit of work here and I'd break this down either >> way. Starting with append use cases makes a lot of sense to me, but I'm >> interested to hear what others think as well. >> >> Ryan >> >> On Fri, Dec 8, 2023 at 4:34 PM Gallardo, Drew <d...@amazon.com.invalid> >> wrote: >> >>> In regards to the multiple emails sent earlier, please use this one for >>> discussions. >>> >>> Thanks you! >>> >>> >>> On 2023/12/07 00:47:42 Drew wrote: >>> > Hi everyone, >>> > >>> > My name is Drew Gallardo, and I’m a part of the Iceberg team at Amazon >>> EMR >>> > and Athena. I’m reaching out to share a proposal that introduces data >>> > commits as a part of the RESTCatalog. The current process for data >>> commits >>> > lives on the client side, and by shifting this logic into the REST >>> catalog, >>> > we can empower the catalog service with more control of this process. >>> > >>> > This proposal addresses specific use cases that showcase the benefits >>> of >>> > moving the commit logic to the service side. For instance, this shift >>> > allows the user to refine conflict resolution mechanisms, giving >>> precedence >>> > to operations that modify the table state to ensure their completion >>> > without conflict. Furthermore, our POC demonstrated an improvement in >>> the >>> > success rate of concurrent write operations against the GlueCatalog. >>> This >>> > all can be found in the detailed proposal below. Feel free to comment, >>> and >>> > add your suggestions! >>> > >>> > Detailed proposal: >>> > >>> https://docs.google.com/document/d/1OG68EtPxLWvNBJACQwcMrRYuGJCnQas8_LSruTRcHG8/edit?usp=sharing >>> > Github POC: https://github.com/apache/iceberg/pull/9237 >>> > >>> > Looking forward to hearing back >>> > >>> > Thanks, >>> > >>> > Drew Gallardo >>> > Amazon EMR & Athena >>> > d...@amazon.com >>> > >> >> >> >> -- >> Ryan Blue >> Tabular >> > -- Ryan Blue Tabular