> Based on my understanding of the proposal, I think it's more about the
possibility of enabling other ways that do not require a full rollback.
it's just currently we implemented it as a rollback to prove the
feasibility.

My main question is this: what can be done besides rolling back a commit?
And why does that require 5 extra routes and metadata writes from the REST
service?

On Mon, Dec 11, 2023 at 11:27 AM Jack Ye <yezhao...@gmail.com> wrote:

> > The proposal is to roll back rewrite commits, but that's already
> possible with the much simpler API that exists today.
>
> Based on my understanding of the proposal, I think it's more about the
> possibility of enabling other ways that do not require a full rollback.
> it's just currently we implemented it as a rollback to prove the
> feasibility. But given that now we have full access to the changes of each
> data commit (compared to only the post-change snapshot), we could
> potentially reuse some files that have been rewritten.
>
> > I'm skeptical that there is a benefit to implementing the set of data
> operations from the Java API
>
> +1, the current Java API might be a bit redundant, some APIs serve very
> similar purposes. I feel the important data actions to have from the end
> user's perspective are basically the ability to (1) AddRows, (2)
> DeleteRows?
>
> -Jack
>
> On Fri, Dec 8, 2023 at 5:01 PM Ryan Blue <b...@tabular.io> wrote:
>
>> Thanks, Drew.
>>
>> I think it's a good idea in general to be able to perform commits on the
>> server-side, but I would much rather break this down into smaller parts. I
>> would definitely want to start with just file append use cases, since I
>> think that is the biggest win. It can reduce retries and is an easy way to
>> write from non-JVM languages or just simpler applications.
>>
>> I'm skeptical that there is a benefit to implementing the set of data
>> operations from the Java API. That's primarily because I don't think that
>> use case 1 (better conflict resolution) is actually achieved. You can avoid
>> retries on the client, but the retries must happen _somewhere_. The
>> proposal is to roll back rewrite commits, but that's already possible with
>> the much simpler API that exists today. Maybe I'm missing something?
>>
>> Even if I'm mistaken about being able to improve conflict resolution, I
>> think that there is quite a bit of work here and I'd break this down either
>> way. Starting with append use cases makes a lot of sense to me, but I'm
>> interested to hear what others think as well.
>>
>> Ryan
>>
>> On Fri, Dec 8, 2023 at 4:34 PM Gallardo, Drew <d...@amazon.com.invalid>
>> wrote:
>>
>>> In regards to the multiple emails sent earlier, please use this one for
>>> discussions.
>>>
>>> Thanks you!
>>>
>>>
>>> On 2023/12/07 00:47:42 Drew wrote:
>>> > Hi everyone,
>>> >
>>> > My name is Drew Gallardo, and I’m a part of the Iceberg team at Amazon
>>> EMR
>>> > and Athena. I’m reaching out to share a proposal that introduces data
>>> > commits as a part of the RESTCatalog. The current process for data
>>> commits
>>> > lives on the client side, and by shifting this logic into the REST
>>> catalog,
>>> > we can empower the catalog service with more control of this process.
>>> >
>>> > This proposal addresses specific use cases that showcase the benefits
>>> of
>>> > moving the commit logic to the service side. For instance, this shift
>>> > allows the user to refine conflict resolution mechanisms, giving
>>> precedence
>>> > to operations that modify the table state to ensure their completion
>>> > without conflict. Furthermore, our POC demonstrated an improvement in
>>> the
>>> > success rate of concurrent write operations against the GlueCatalog.
>>> This
>>> > all can be found in the detailed proposal below. Feel free to comment,
>>> and
>>> > add your suggestions!
>>> >
>>> > Detailed proposal:
>>> >
>>> https://docs.google.com/document/d/1OG68EtPxLWvNBJACQwcMrRYuGJCnQas8_LSruTRcHG8/edit?usp=sharing
>>> > Github POC: https://github.com/apache/iceberg/pull/9237
>>> >
>>> > Looking forward to hearing back
>>> >
>>> > Thanks,
>>> >
>>> > Drew Gallardo
>>> > Amazon EMR & Athena
>>> > d...@amazon.com
>>> >
>>
>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

-- 
Ryan Blue
Tabular

Reply via email to