I’m not sure that there is a single tenant to follow, but I can outline how
I think about the REST protocol.

The problem that the REST API solves is to standardize catalog interaction
for Iceberg. I think that relies on being both a good standard and a good
API. A good standard is small, specific, and broadly useful. A good API is
well-defined, capable, and abstracts details.

These aren’t always aligned, which I think is the issue here. I do think
that a good direction for evolving the REST API is to make it possible for
the catalog service to take on additional responsibilities. The new
planning API is a good example of that: we can introduce straightforward
calls that alleviate the need for clients to do a lot of work. A file
delete API similarly takes on work and can simplify the client, but the
problem is that it would either be overly simple (and a misleading API) or
would need to be very complex (not a small and broadly useful standard).

I think that we may want to introduce a complex transaction endpoint at
some point, but it seems like a big jump from where we are at today. And
I’m not convinced that it has a lot of value — that’s why I’m asking about
real use cases. The value needs to outweigh the drawback of greatly
expanding the spec, or we need to find smaller changes that are useful like
the append case.

Right now, the client-side approach allows the standard to be broadly
useful, just like the table spec. You can make any changes you want in a
commit as long as the metadata is valid. Validations needed to make
consistency guarantees are handled by clients, which avoids needing to
expand the standard to express those constraints.

Ryan

On Mon, Feb 26, 2024 at 2:12 PM Jack Ye <yezhao...@gmail.com> wrote:

> > I don’t see how extending the REST protocol like this would make an
> impact on that problem.
>
> I realize maybe we should take a step back, and first align on the
> position of the REST protocol, before tapping further into what could be
> improved with delete files and CDC. Specifically, *what is the tenant to
> follow for evolving the REST spec*?
>
> My understanding is that we would like to develop this REST protocol so
> that *a catalog service can take more responsibilities in optimizing the
> read and write paths of an Iceberg table*. This proposal (and also the
> plan table proposal) follows the direction above, where the catalog can
> take control of more logic at the table scan and commit time with the
> proposed changes, and thus potentially enable more optimizations behind the
> scene while remaining true to the Iceberg table (and view) spec.
>
> For the data commit code path specifically, yes we started with trying to
> support concrete use cases like better streaming appends, CDC deletes, but
> the design aims at pushing the commit complexity to the service side if the
> service can handle it, and try to realize those use cases in the service
> with that design. The current UpdateTable API for just adding and setting a
> full snapshot makes it difficult to do much with optimization, because the
> majority of the work has already been done at the client side to produce
> the manifest list and snapshot.
>
> What is your thought on this?
>
> -Jack
>
>
>
> On Wed, Feb 21, 2024 at 5:00 PM Ryan Blue <b...@tabular.io> wrote:
>
>> Okay, so it sounds like the motivation is to improve the story around
>> CDC. That’s a good area to work on, but I don’t see how extending the REST
>> protocol like this would make an impact on that problem. In addition, I’m
>> not following your rationale for a few things, so we should probably take a
>> look at those areas.
>>
>> I think Iceberg today does not have a good solution at the storage layer.
>> So far, we have basically just said that Iceberg is not efficient at doing
>> this because of the inefficiency in applying massively produced equality
>> delete files.
>>
>> What do you mean by “inefficiency in applying massively produced equality
>> delete files”? I’d like to understand that better because I think that the
>> tools are already in place for efficiency. What has been missing is a good
>> design for using those tools. I think we’ve only seen implementations that
>> create too many delete files because they don’t redistribute or sort
>> deletes. The implementations also don’t maintain delete files and stack up
>> way too many delete files. These are half-solutions that defer work until
>> later, putting all of the cost on the read path.
>>
>> we thought about proposing something like an “upsert file” format so that
>> we can run a more efficient LSM algorithm than the one used to associate
>> delete files with data files using stats and then apply deletes as filters
>>
>> I think that we have the tools to do this already. While we chose to
>> separate deletes from inserts, the ways that the LSM approach reduces work
>> by sorting and finding overlapping ranges are already applied when Iceberg
>> plans a scan and associates deletes with data files. Iceberg also has
>> metadata to track the sort order of delete files if we want to update to
>> use a merge approach. These tools are largely unused because the write path
>> doesn’t prepare data to take advantage of them.
>>
>> It seems like we’ve come to different conclusions about the underlying
>> issues and I’d like to dig in more to find out what you’d change or what
>> specific features of an LSM approach you’re looking for that we can’t do
>> today. Maybe we should set up a time to talk with a group that wants to
>> work on this area?
>>
>> Ryan
>>
>> On Wed, Feb 21, 2024 at 3:55 PM Jack Ye <yezhao...@gmail.com> wrote:
>>
>>> Thanks for the response Ryan!
>>>
>>> > The solution to the problem above is to add more to the API — maybe
>>> have a single endpoint that can delete and append files in a single commit.
>>> But then pushing this to the server requires that we also support
>>> validations to ensure the swap is valid when there are retries.
>>>
>>> Yes, this is basically what is proposed in the original proposal, and I
>>> agree it is a very big and complicated surface area that we are proposing
>>> to change. If there are strong concerns about that, I feel we should go
>>> back to the original reason why I was personally very interested in that
>>> delete use case, and try to see if there are alternative solutions.
>>>
>>> The main use case that Drew's team is interested in is efficient
>>> append-only streaming, so the new proposal probably works well. On the
>>> other side, I was trying to see if we can improve the upsert-streaming
>>> (CDC) use case as a part of this effort. I think Iceberg today does not
>>> have a good solution at the storage layer. So far, we have basically just
>>> said that Iceberg is not efficient at doing this because of the
>>> inefficiency in applying massively produced equality delete files. Instead,
>>> we recommend users to push the work to the compute layer. Maybe it is okay
>>> to have this stance, but I just think it is a missed opportunity.
>>>
>>> My hope was that by extending the API level surface to be flexible
>>> enough, these challenges can be overcome at least at the service layer by
>>> leveraging more complex and purpose-built components to handle a large
>>> commit volume and a large number of delete files that need to be applied at
>>> scanning time. If we do not want to go with this route due to the concerns
>>> above, then I think we should think more about storage layer solutions.
>>>
>>> For example, back when we were developing delete file support, we
>>> thought about proposing something like an "upsert file" format so that we
>>> can run a more efficient LSM algorithm than the one used to associate
>>> delete files with data files using stats and then apply deletes as filters.
>>> But we never did that because the delete format still required some time to
>>> mature. Maybe to fully solve the issue, we should think about adding it. I
>>> believe this is technically also what Apache Paimon is doing (I have very
>>> limited knowledge of that project so sorry if I am wrong on this)
>>>
>>> Any thoughts?
>>>
>>> -Jack
>>>
>>> On Wed, Feb 21, 2024 at 11:10 AM Ryan Blue <b...@tabular.io> wrote:
>>>
>>>> Thanks for pushing this forward, Drew and Jack!
>>>>
>>>> Jack just asked “how would such endpoints work with multi-table
>>>> transactions?” — that demonstrates a big concern that I have about adding
>>>> remove or delete file append endpoints. I don’t think that those endpoints
>>>> can or should be used for transactions, and I worry that they are too
>>>> confusing to put into the API because people will use them to build unsafe
>>>> features.
>>>>
>>>> For example, say we add an endpoint to allow people to delete data
>>>> files. There are definitely use cases for this, like deleting old
>>>> partitions from a table. But I could easily see someone using it for
>>>> transactional purposes as well, like calling delete and then append to
>>>> replace a data file (this is the simplest and “safest” case I can think
>>>> of). That just takes a Parquet reader/writer and a couple of REST calls,
>>>> and I think people would resort to using it without knowing that it is
>>>> dangerous. That opens up cases where there’s a valid snapshot where the
>>>> data is missing, and failures can cause data loss.
>>>>
>>>> The solution to the problem above is to add more to the API — maybe
>>>> have a single endpoint that can delete and append files in a single commit.
>>>> But then pushing this to the server requires that we also support
>>>> validations to ensure the swap is valid when there are retries. For
>>>> instance, if this is the result of running UPDATE ... WHERE ... with
>>>> serializable isolation, the client would validate that no changes to rows
>>>> that match the WHERE clause have happened. That validation would now
>>>> need to run on the server side and so we would need to extend the API to be
>>>> able to handle it. That makes this API surface area much, much larger and
>>>> more complicated.
>>>>
>>>> Adding the generic commit API is a ton more work for something that
>>>> works today, so I think we need to be cautious and proceed if there is a
>>>> strong argument that this is needed. The argument for this is that it makes
>>>> writing from new languages simpler, but that’s not enough (yet) to convince
>>>> me that it is a good idea.
>>>>
>>>> I think there’s a strong case for append, but for deletes I don’t think
>>>> we need or want to go there. I certainly would not want to add delete
>>>> endpoints that would be misused.
>>>>
>>>> Ryan
>>>>
>>>> On Tue, Feb 20, 2024 at 3:28 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>>
>>>>> I think there is also a point we were discussing but never closed
>>>>> regarding AppendDeleteFiles, if that should be supported. The recent
>>>>> development in Kafka, and vendor products like Upsolver Zero-ETL
>>>>> <https://www.upsolver.com/blog/upsolver-announces-zero-etl-and-lakehouse-optimization-for-apache-iceberg>
>>>>> seems to suggest that there is a demand for people to also just
>>>>> append/stream deletes to a table. So I think it would be ideal that we can
>>>>> support both data and delete files if we create a new append API.
>>>>>
>>>>> For RemoveDataFiles and RemoveDeleteFiles, does that mean we need
>>>>> another tables/{table}/remove endpoint? Also, how would such endpoints 
>>>>> work
>>>>> with multi-table transactions? Let's think through those points.
>>>>>
>>>>> -Jack
>>>>>
>>>>> On Tue, Feb 20, 2024 at 3:13 PM Drew <img...@gmail.com> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> As we are discussing the rest spec changes to add support for
>>>>>> DataFiles and DeleteFiles for both appends and scan planning API (PR:
>>>>>> https://github.com/apache/iceberg/pull/9717). One thing that came up
>>>>>> for appends was that this logic shouldn’t be in the table update API but
>>>>>> instead it should have a dedicated endpoint. This would be beneficial 
>>>>>> for a
>>>>>> few use cases such as, asynchronous appends, and batch commit support.
>>>>>>
>>>>>> I’d like to start a discussion on thoughts around introducing this
>>>>>> new endpoint and its functionality to support the ongoing fine-grained
>>>>>> metadata commit efforts. From the discussion in the ContentFile spec 
>>>>>> change
>>>>>> PR, the proposed endpoint was envisioned as an append update handling
>>>>>> update requests asynchronously. The link to that discussion can be found
>>>>>> here:
>>>>>> https://github.com/apache/iceberg/pull/9717#discussion_r1495005890.
>>>>>> The proposed changes include:
>>>>>>
>>>>>> *Endpoint*:
>>>>>> POST /v1/{prefix}/namespaces/{namespace}/tables/{table}/append
>>>>>>
>>>>>> *Request*:
>>>>>> {
>>>>>>   "accept-delay-ms": 300000, // acceptable delay for processing
>>>>>>   "data-files": [...]
>>>>>> }
>>>>>>
>>>>>> *Response*:
>>>>>> 202 accepted
>>>>>> {
>>>>>>    “location”:
>>>>>> “/v1/{prefix}/namespaces/{namespace}/tables/{table}/status/{id}“ // used 
>>>>>> to
>>>>>> track status
>>>>>> }
>>>>>>
>>>>>> I'm interested in gathering your thoughts on the asynchronous
>>>>>> operation model and the suggested endpoint structure.
>>>>>>
>>>>>> Building on this, we previously discussed having these update
>>>>>> options: RemoveDataFiles and RemoveDeleteFiles. Given this new endpoint
>>>>>> structure, we should consider whether or not we should have unified or
>>>>>> separate endpoints for these operations. For instance, should we organize
>>>>>> these under a shared endpoint and specify operationType, or do establish
>>>>>> distinct endpoints for these operations? Given that appends can support
>>>>>> batch processing, we can accommodate this in the request model.
>>>>>>
>>>>>> Thank you,
>>>>>> Drew
>>>>>>
>>>>>> On Fri, Jan 26, 2024 at 5:06 PM Drew <img...@gmail.com> wrote:
>>>>>>
>>>>>>> Hey everyone,
>>>>>>>
>>>>>>> I wanted to provide a quick update on the progress of the commit API
>>>>>>> proposal. Based on the feedback in the design doc and the Slack
>>>>>>> conversation with Dan and Jack, we've reached an agreement that this is
>>>>>>> more of a fine-grained metadata commit, rather than a data operation or
>>>>>>> commit. For the next steps, I'll be focusing on validating the 
>>>>>>> requirements
>>>>>>> for the update requests. Additionally, I'll be working on adding the
>>>>>>> necessary tests to ensure its end-to-end functionality.
>>>>>>>
>>>>>>> Thanks for all the feedback, I still have an open PR for an
>>>>>>> appendFiles. If you have a chance to review, I would appreciate any
>>>>>>> additional feedback you may have.
>>>>>>>
>>>>>>> https://github.com/apache/iceberg/pull/9292
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Drew
>>>>>>>
>>>>>>> On Fri, Jan 12, 2024 at 3:40 PM Drew <img...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> I hope you all had great holidays! I wanted to resurface this
>>>>>>>> proposal for RESTful Data operations.
>>>>>>>>
>>>>>>>> Currently, I have a open PR here:
>>>>>>>> https://github.com/apache/iceberg/pull/9292
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Drew
>>>>>>>>
>>>>>>>> On Wed, Dec 13, 2023 at 3:04 PM Jack Ye <yezhao...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Drew for the quick turnaround, I will take a deeper look
>>>>>>>>> into the PR.
>>>>>>>>>
>>>>>>>>> I think if we all agree that it is beneficial to have the
>>>>>>>>> AppendFIles(DataFile[]) API (maybe we should call it AppendRows 
>>>>>>>>> instead), I
>>>>>>>>> would like to know if it also makes sense to have:
>>>>>>>>> 1. DeleteRows(DeleteFile[]), which can allow users to describe the
>>>>>>>>> deletion of rows easily through the equality delete spec
>>>>>>>>> 2. combine the 2 APIs of AppendRows and DeleteRows to one single
>>>>>>>>> type of action
>>>>>>>>>
>>>>>>>>> I find it pretty intuitive from a user perspective to express
>>>>>>>>> deletion of rows and commit them through equality deletes, and it 
>>>>>>>>> would
>>>>>>>>> allow performing updates through simple applications.
>>>>>>>>>
>>>>>>>>> -Jack
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Dec 13, 2023 at 2:22 PM Drew <img...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Ryan,
>>>>>>>>>>
>>>>>>>>>> Thanks for the feedback, I'll start going through the comments
>>>>>>>>>> left in the doc! You're right in pointing out that the logic here 
>>>>>>>>>> can be
>>>>>>>>>> simplified to roll back a commit. For now I introduced a smaller PR, 
>>>>>>>>>> that
>>>>>>>>>> focuses on the append files operation.
>>>>>>>>>>
>>>>>>>>>> Github PR: https://github.com/apache/iceberg/pull/9292
>>>>>>>>>> Drew
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Dec 11, 2023 at 11:33 AM Ryan Blue <b...@tabular.io>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> > Based on my understanding of the proposal, I think it's more
>>>>>>>>>>> about the possibility of enabling other ways that do not require a 
>>>>>>>>>>> full
>>>>>>>>>>> rollback. it's just currently we implemented it as a rollback to 
>>>>>>>>>>> prove the
>>>>>>>>>>> feasibility.
>>>>>>>>>>>
>>>>>>>>>>> My main question is this: what can be done besides rolling back
>>>>>>>>>>> a commit? And why does that require 5 extra routes and metadata 
>>>>>>>>>>> writes from
>>>>>>>>>>> the REST service?
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Dec 11, 2023 at 11:27 AM Jack Ye <yezhao...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> > The proposal is to roll back rewrite commits, but that's
>>>>>>>>>>>> already possible with the much simpler API that exists today.
>>>>>>>>>>>>
>>>>>>>>>>>> Based on my understanding of the proposal, I think it's more
>>>>>>>>>>>> about the possibility of enabling other ways that do not require a 
>>>>>>>>>>>> full
>>>>>>>>>>>> rollback. it's just currently we implemented it as a rollback to 
>>>>>>>>>>>> prove the
>>>>>>>>>>>> feasibility. But given that now we have full access to the changes 
>>>>>>>>>>>> of each
>>>>>>>>>>>> data commit (compared to only the post-change snapshot), we could
>>>>>>>>>>>> potentially reuse some files that have been rewritten.
>>>>>>>>>>>>
>>>>>>>>>>>> > I'm skeptical that there is a benefit to implementing the set
>>>>>>>>>>>> of data operations from the Java API
>>>>>>>>>>>>
>>>>>>>>>>>> +1, the current Java API might be a bit redundant, some APIs
>>>>>>>>>>>> serve very similar purposes. I feel the important data actions to 
>>>>>>>>>>>> have from
>>>>>>>>>>>> the end user's perspective are basically the ability to (1) 
>>>>>>>>>>>> AddRows, (2)
>>>>>>>>>>>> DeleteRows?
>>>>>>>>>>>>
>>>>>>>>>>>> -Jack
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Dec 8, 2023 at 5:01 PM Ryan Blue <b...@tabular.io>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks, Drew.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think it's a good idea in general to be able to perform
>>>>>>>>>>>>> commits on the server-side, but I would much rather break this 
>>>>>>>>>>>>> down into
>>>>>>>>>>>>> smaller parts. I would definitely want to start with just file 
>>>>>>>>>>>>> append use
>>>>>>>>>>>>> cases, since I think that is the biggest win. It can reduce 
>>>>>>>>>>>>> retries and is
>>>>>>>>>>>>> an easy way to write from non-JVM languages or just simpler 
>>>>>>>>>>>>> applications.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm skeptical that there is a benefit to implementing the set
>>>>>>>>>>>>> of data operations from the Java API. That's primarily because I 
>>>>>>>>>>>>> don't
>>>>>>>>>>>>> think that use case 1 (better conflict resolution) is actually 
>>>>>>>>>>>>> achieved.
>>>>>>>>>>>>> You can avoid retries on the client, but the retries must happen
>>>>>>>>>>>>> _somewhere_. The proposal is to roll back rewrite commits, but 
>>>>>>>>>>>>> that's
>>>>>>>>>>>>> already possible with the much simpler API that exists today. 
>>>>>>>>>>>>> Maybe I'm
>>>>>>>>>>>>> missing something?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Even if I'm mistaken about being able to improve conflict
>>>>>>>>>>>>> resolution, I think that there is quite a bit of work here and 
>>>>>>>>>>>>> I'd break
>>>>>>>>>>>>> this down either way. Starting with append use cases makes a lot 
>>>>>>>>>>>>> of sense
>>>>>>>>>>>>> to me, but I'm interested to hear what others think as well.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Dec 8, 2023 at 4:34 PM Gallardo, Drew
>>>>>>>>>>>>> <d...@amazon.com.invalid> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> In regards to the multiple emails sent earlier, please use
>>>>>>>>>>>>>> this one for discussions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks you!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2023/12/07 00:47:42 Drew wrote:
>>>>>>>>>>>>>> > Hi everyone,
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > My name is Drew Gallardo, and I’m a part of the Iceberg
>>>>>>>>>>>>>> team at Amazon EMR
>>>>>>>>>>>>>> > and Athena. I’m reaching out to share a proposal that
>>>>>>>>>>>>>> introduces data
>>>>>>>>>>>>>> > commits as a part of the RESTCatalog. The current process
>>>>>>>>>>>>>> for data commits
>>>>>>>>>>>>>> > lives on the client side, and by shifting this logic into
>>>>>>>>>>>>>> the REST catalog,
>>>>>>>>>>>>>> > we can empower the catalog service with more control of
>>>>>>>>>>>>>> this process.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > This proposal addresses specific use cases that showcase
>>>>>>>>>>>>>> the benefits of
>>>>>>>>>>>>>> > moving the commit logic to the service side. For instance,
>>>>>>>>>>>>>> this shift
>>>>>>>>>>>>>> > allows the user to refine conflict resolution mechanisms,
>>>>>>>>>>>>>> giving precedence
>>>>>>>>>>>>>> > to operations that modify the table state to ensure their
>>>>>>>>>>>>>> completion
>>>>>>>>>>>>>> > without conflict. Furthermore, our POC demonstrated an
>>>>>>>>>>>>>> improvement in the
>>>>>>>>>>>>>> > success rate of concurrent write operations against the
>>>>>>>>>>>>>> GlueCatalog. This
>>>>>>>>>>>>>> > all can be found in the detailed proposal below. Feel free
>>>>>>>>>>>>>> to comment, and
>>>>>>>>>>>>>> > add your suggestions!
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Detailed proposal:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> https://docs.google.com/document/d/1OG68EtPxLWvNBJACQwcMrRYuGJCnQas8_LSruTRcHG8/edit?usp=sharing
>>>>>>>>>>>>>> > Github POC: https://github.com/apache/iceberg/pull/9237
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Looking forward to hearing back
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Thanks,
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Drew Gallardo
>>>>>>>>>>>>>> > Amazon EMR & Athena
>>>>>>>>>>>>>> > d...@amazon.com
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>>> Tabular
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Ryan Blue
>>>>>>>>>>> Tabular
>>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Tabular
>>>>
>>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

-- 
Ryan Blue
Tabular

Reply via email to