> The proposal is to roll back rewrite commits, but that's already possible
with the much simpler API that exists today.

Based on my understanding of the proposal, I think it's more about the
possibility of enabling other ways that do not require a full rollback.
it's just currently we implemented it as a rollback to prove the
feasibility. But given that now we have full access to the changes of each
data commit (compared to only the post-change snapshot), we could
potentially reuse some files that have been rewritten.

> I'm skeptical that there is a benefit to implementing the set of data
operations from the Java API

+1, the current Java API might be a bit redundant, some APIs serve very
similar purposes. I feel the important data actions to have from the end
user's perspective are basically the ability to (1) AddRows, (2)
DeleteRows?

-Jack

On Fri, Dec 8, 2023 at 5:01 PM Ryan Blue <b...@tabular.io> wrote:

> Thanks, Drew.
>
> I think it's a good idea in general to be able to perform commits on the
> server-side, but I would much rather break this down into smaller parts. I
> would definitely want to start with just file append use cases, since I
> think that is the biggest win. It can reduce retries and is an easy way to
> write from non-JVM languages or just simpler applications.
>
> I'm skeptical that there is a benefit to implementing the set of data
> operations from the Java API. That's primarily because I don't think that
> use case 1 (better conflict resolution) is actually achieved. You can avoid
> retries on the client, but the retries must happen _somewhere_. The
> proposal is to roll back rewrite commits, but that's already possible with
> the much simpler API that exists today. Maybe I'm missing something?
>
> Even if I'm mistaken about being able to improve conflict resolution, I
> think that there is quite a bit of work here and I'd break this down either
> way. Starting with append use cases makes a lot of sense to me, but I'm
> interested to hear what others think as well.
>
> Ryan
>
> On Fri, Dec 8, 2023 at 4:34 PM Gallardo, Drew <d...@amazon.com.invalid>
> wrote:
>
>> In regards to the multiple emails sent earlier, please use this one for
>> discussions.
>>
>> Thanks you!
>>
>>
>> On 2023/12/07 00:47:42 Drew wrote:
>> > Hi everyone,
>> >
>> > My name is Drew Gallardo, and I’m a part of the Iceberg team at Amazon
>> EMR
>> > and Athena. I’m reaching out to share a proposal that introduces data
>> > commits as a part of the RESTCatalog. The current process for data
>> commits
>> > lives on the client side, and by shifting this logic into the REST
>> catalog,
>> > we can empower the catalog service with more control of this process.
>> >
>> > This proposal addresses specific use cases that showcase the benefits of
>> > moving the commit logic to the service side. For instance, this shift
>> > allows the user to refine conflict resolution mechanisms, giving
>> precedence
>> > to operations that modify the table state to ensure their completion
>> > without conflict. Furthermore, our POC demonstrated an improvement in
>> the
>> > success rate of concurrent write operations against the GlueCatalog.
>> This
>> > all can be found in the detailed proposal below. Feel free to comment,
>> and
>> > add your suggestions!
>> >
>> > Detailed proposal:
>> >
>> https://docs.google.com/document/d/1OG68EtPxLWvNBJACQwcMrRYuGJCnQas8_LSruTRcHG8/edit?usp=sharing
>> > Github POC: https://github.com/apache/iceberg/pull/9237
>> >
>> > Looking forward to hearing back
>> >
>> > Thanks,
>> >
>> > Drew Gallardo
>> > Amazon EMR & Athena
>> > d...@amazon.com
>> >
>
>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to