> The proposal is to roll back rewrite commits, but that's already possible with the much simpler API that exists today.
Based on my understanding of the proposal, I think it's more about the possibility of enabling other ways that do not require a full rollback. it's just currently we implemented it as a rollback to prove the feasibility. But given that now we have full access to the changes of each data commit (compared to only the post-change snapshot), we could potentially reuse some files that have been rewritten. > I'm skeptical that there is a benefit to implementing the set of data operations from the Java API +1, the current Java API might be a bit redundant, some APIs serve very similar purposes. I feel the important data actions to have from the end user's perspective are basically the ability to (1) AddRows, (2) DeleteRows? -Jack On Fri, Dec 8, 2023 at 5:01 PM Ryan Blue <b...@tabular.io> wrote: > Thanks, Drew. > > I think it's a good idea in general to be able to perform commits on the > server-side, but I would much rather break this down into smaller parts. I > would definitely want to start with just file append use cases, since I > think that is the biggest win. It can reduce retries and is an easy way to > write from non-JVM languages or just simpler applications. > > I'm skeptical that there is a benefit to implementing the set of data > operations from the Java API. That's primarily because I don't think that > use case 1 (better conflict resolution) is actually achieved. You can avoid > retries on the client, but the retries must happen _somewhere_. The > proposal is to roll back rewrite commits, but that's already possible with > the much simpler API that exists today. Maybe I'm missing something? > > Even if I'm mistaken about being able to improve conflict resolution, I > think that there is quite a bit of work here and I'd break this down either > way. Starting with append use cases makes a lot of sense to me, but I'm > interested to hear what others think as well. > > Ryan > > On Fri, Dec 8, 2023 at 4:34 PM Gallardo, Drew <d...@amazon.com.invalid> > wrote: > >> In regards to the multiple emails sent earlier, please use this one for >> discussions. >> >> Thanks you! >> >> >> On 2023/12/07 00:47:42 Drew wrote: >> > Hi everyone, >> > >> > My name is Drew Gallardo, and I’m a part of the Iceberg team at Amazon >> EMR >> > and Athena. I’m reaching out to share a proposal that introduces data >> > commits as a part of the RESTCatalog. The current process for data >> commits >> > lives on the client side, and by shifting this logic into the REST >> catalog, >> > we can empower the catalog service with more control of this process. >> > >> > This proposal addresses specific use cases that showcase the benefits of >> > moving the commit logic to the service side. For instance, this shift >> > allows the user to refine conflict resolution mechanisms, giving >> precedence >> > to operations that modify the table state to ensure their completion >> > without conflict. Furthermore, our POC demonstrated an improvement in >> the >> > success rate of concurrent write operations against the GlueCatalog. >> This >> > all can be found in the detailed proposal below. Feel free to comment, >> and >> > add your suggestions! >> > >> > Detailed proposal: >> > >> https://docs.google.com/document/d/1OG68EtPxLWvNBJACQwcMrRYuGJCnQas8_LSruTRcHG8/edit?usp=sharing >> > Github POC: https://github.com/apache/iceberg/pull/9237 >> > >> > Looking forward to hearing back >> > >> > Thanks, >> > >> > Drew Gallardo >> > Amazon EMR & Athena >> > d...@amazon.com >> > > > > > -- > Ryan Blue > Tabular >