Hi everyone,

I hope you all had great holidays! I wanted to resurface this proposal for
RESTful Data operations.

Currently, I have a open PR here:
https://github.com/apache/iceberg/pull/9292

Thanks,
Drew

On Wed, Dec 13, 2023 at 3:04 PM Jack Ye <yezhao...@gmail.com> wrote:

> Thanks Drew for the quick turnaround, I will take a deeper look into the
> PR.
>
> I think if we all agree that it is beneficial to have the
> AppendFIles(DataFile[]) API (maybe we should call it AppendRows instead), I
> would like to know if it also makes sense to have:
> 1. DeleteRows(DeleteFile[]), which can allow users to describe the
> deletion of rows easily through the equality delete spec
> 2. combine the 2 APIs of AppendRows and DeleteRows to one single type of
> action
>
> I find it pretty intuitive from a user perspective to express deletion of
> rows and commit them through equality deletes, and it would allow
> performing updates through simple applications.
>
> -Jack
>
>
>
>
>
>
>
>
> On Wed, Dec 13, 2023 at 2:22 PM Drew <img...@gmail.com> wrote:
>
>> Hi Ryan,
>>
>> Thanks for the feedback, I'll start going through the comments left in
>> the doc! You're right in pointing out that the logic here can be simplified
>> to roll back a commit. For now I introduced a smaller PR, that focuses on
>> the append files operation.
>>
>> Github PR: https://github.com/apache/iceberg/pull/9292
>> Drew
>>
>>
>> On Mon, Dec 11, 2023 at 11:33 AM Ryan Blue <b...@tabular.io> wrote:
>>
>>> > Based on my understanding of the proposal, I think it's more about the
>>> possibility of enabling other ways that do not require a full rollback.
>>> it's just currently we implemented it as a rollback to prove the
>>> feasibility.
>>>
>>> My main question is this: what can be done besides rolling back a
>>> commit? And why does that require 5 extra routes and metadata writes from
>>> the REST service?
>>>
>>> On Mon, Dec 11, 2023 at 11:27 AM Jack Ye <yezhao...@gmail.com> wrote:
>>>
>>>> > The proposal is to roll back rewrite commits, but that's already
>>>> possible with the much simpler API that exists today.
>>>>
>>>> Based on my understanding of the proposal, I think it's more about the
>>>> possibility of enabling other ways that do not require a full rollback.
>>>> it's just currently we implemented it as a rollback to prove the
>>>> feasibility. But given that now we have full access to the changes of each
>>>> data commit (compared to only the post-change snapshot), we could
>>>> potentially reuse some files that have been rewritten.
>>>>
>>>> > I'm skeptical that there is a benefit to implementing the set of data
>>>> operations from the Java API
>>>>
>>>> +1, the current Java API might be a bit redundant, some APIs serve very
>>>> similar purposes. I feel the important data actions to have from the end
>>>> user's perspective are basically the ability to (1) AddRows, (2)
>>>> DeleteRows?
>>>>
>>>> -Jack
>>>>
>>>> On Fri, Dec 8, 2023 at 5:01 PM Ryan Blue <b...@tabular.io> wrote:
>>>>
>>>>> Thanks, Drew.
>>>>>
>>>>> I think it's a good idea in general to be able to perform commits on
>>>>> the server-side, but I would much rather break this down into smaller
>>>>> parts. I would definitely want to start with just file append use cases,
>>>>> since I think that is the biggest win. It can reduce retries and is an 
>>>>> easy
>>>>> way to write from non-JVM languages or just simpler applications.
>>>>>
>>>>> I'm skeptical that there is a benefit to implementing the set of data
>>>>> operations from the Java API. That's primarily because I don't think that
>>>>> use case 1 (better conflict resolution) is actually achieved. You can 
>>>>> avoid
>>>>> retries on the client, but the retries must happen _somewhere_. The
>>>>> proposal is to roll back rewrite commits, but that's already possible with
>>>>> the much simpler API that exists today. Maybe I'm missing something?
>>>>>
>>>>> Even if I'm mistaken about being able to improve conflict resolution,
>>>>> I think that there is quite a bit of work here and I'd break this down
>>>>> either way. Starting with append use cases makes a lot of sense to me, but
>>>>> I'm interested to hear what others think as well.
>>>>>
>>>>> Ryan
>>>>>
>>>>> On Fri, Dec 8, 2023 at 4:34 PM Gallardo, Drew <d...@amazon.com.invalid>
>>>>> wrote:
>>>>>
>>>>>> In regards to the multiple emails sent earlier, please use this one
>>>>>> for discussions.
>>>>>>
>>>>>> Thanks you!
>>>>>>
>>>>>>
>>>>>> On 2023/12/07 00:47:42 Drew wrote:
>>>>>> > Hi everyone,
>>>>>> >
>>>>>> > My name is Drew Gallardo, and I’m a part of the Iceberg team at
>>>>>> Amazon EMR
>>>>>> > and Athena. I’m reaching out to share a proposal that introduces
>>>>>> data
>>>>>> > commits as a part of the RESTCatalog. The current process for data
>>>>>> commits
>>>>>> > lives on the client side, and by shifting this logic into the REST
>>>>>> catalog,
>>>>>> > we can empower the catalog service with more control of this
>>>>>> process.
>>>>>> >
>>>>>> > This proposal addresses specific use cases that showcase the
>>>>>> benefits of
>>>>>> > moving the commit logic to the service side. For instance, this
>>>>>> shift
>>>>>> > allows the user to refine conflict resolution mechanisms, giving
>>>>>> precedence
>>>>>> > to operations that modify the table state to ensure their completion
>>>>>> > without conflict. Furthermore, our POC demonstrated an improvement
>>>>>> in the
>>>>>> > success rate of concurrent write operations against the
>>>>>> GlueCatalog. This
>>>>>> > all can be found in the detailed proposal below. Feel free to
>>>>>> comment, and
>>>>>> > add your suggestions!
>>>>>> >
>>>>>> > Detailed proposal:
>>>>>> >
>>>>>> https://docs.google.com/document/d/1OG68EtPxLWvNBJACQwcMrRYuGJCnQas8_LSruTRcHG8/edit?usp=sharing
>>>>>> > Github POC: https://github.com/apache/iceberg/pull/9237
>>>>>> >
>>>>>> > Looking forward to hearing back
>>>>>> >
>>>>>> > Thanks,
>>>>>> >
>>>>>> > Drew Gallardo
>>>>>> > Amazon EMR & Athena
>>>>>> > d...@amazon.com
>>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Tabular
>>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>

Reply via email to