Thanks, Drew.

I think it's a good idea in general to be able to perform commits on the
server-side, but I would much rather break this down into smaller parts. I
would definitely want to start with just file append use cases, since I
think that is the biggest win. It can reduce retries and is an easy way to
write from non-JVM languages or just simpler applications.

I'm skeptical that there is a benefit to implementing the set of data
operations from the Java API. That's primarily because I don't think that
use case 1 (better conflict resolution) is actually achieved. You can avoid
retries on the client, but the retries must happen _somewhere_. The
proposal is to roll back rewrite commits, but that's already possible with
the much simpler API that exists today. Maybe I'm missing something?

Even if I'm mistaken about being able to improve conflict resolution, I
think that there is quite a bit of work here and I'd break this down either
way. Starting with append use cases makes a lot of sense to me, but I'm
interested to hear what others think as well.

Ryan

On Fri, Dec 8, 2023 at 4:34 PM Gallardo, Drew <d...@amazon.com.invalid>
wrote:

> In regards to the multiple emails sent earlier, please use this one for
> discussions.
>
> Thanks you!
>
>
> On 2023/12/07 00:47:42 Drew wrote:
> > Hi everyone,
> >
> > My name is Drew Gallardo, and I’m a part of the Iceberg team at Amazon
> EMR
> > and Athena. I’m reaching out to share a proposal that introduces data
> > commits as a part of the RESTCatalog. The current process for data
> commits
> > lives on the client side, and by shifting this logic into the REST
> catalog,
> > we can empower the catalog service with more control of this process.
> >
> > This proposal addresses specific use cases that showcase the benefits of
> > moving the commit logic to the service side. For instance, this shift
> > allows the user to refine conflict resolution mechanisms, giving
> precedence
> > to operations that modify the table state to ensure their completion
> > without conflict. Furthermore, our POC demonstrated an improvement in the
> > success rate of concurrent write operations against the GlueCatalog. This
> > all can be found in the detailed proposal below. Feel free to comment,
> and
> > add your suggestions!
> >
> > Detailed proposal:
> >
> https://docs.google.com/document/d/1OG68EtPxLWvNBJACQwcMrRYuGJCnQas8_LSruTRcHG8/edit?usp=sharing
> > Github POC: https://github.com/apache/iceberg/pull/9237
> >
> > Looking forward to hearing back
> >
> > Thanks,
> >
> > Drew Gallardo
> > Amazon EMR & Athena
> > d...@amazon.com
> >



-- 
Ryan Blue
Tabular

Reply via email to