I did not notice the difference between table and view. Should we change
that for tables then?

It depends on what we consider a breaking change at this point. Plus, we
may want it to be optional in the future.

My main point, though, is that I wouldn’t read too much into it being
optional. I think we all have the same expectations for REST services today
— that they need to follow both the Iceberg table spec and the REST spec. I
would treat findings like this as an opportunity to make the specs more
clear about requirements.

Ryan

On Thu, Feb 29, 2024 at 4:28 PM Jack Ye <yezhao...@gmail.com> wrote:

> > I feel like the goal is to identify those cases and steer them back into
> compliance with the spec
>
> +100000
>
> > as opposed to immediately claiming they're something entirely different
>
> In case this comment is talking about my last sentence "More extremely, it
> might be a totally different kind of table that is only surfaced through
> the REST models." I don't mean the Iceberg tables/view in a REST compatible
> catalog are entirely different. I mean there is another more extreme use
> case, where the REST catalog can surface other non-Iceberg tables (e.g.
> Hive Parquet tables) through the same REST model, which is a use case I
> mentioned previously that is an interesting application of REST that we see
> some users are interested in. Which is also why the metadata location is
> important to guide those use cases.
>
> > when we added the endpoint to load a VIEW, metadata-location was
> correctly marked as required
>
> hmmm interesting, you are right, I did not notice the difference between
> table and view. Should we change that for tables then?
>
> -Jack
>
> On Thu, Feb 29, 2024 at 4:20 PM Ryan Blue <b...@tabular.io> wrote:
>
>> Oops. In the first paragraph, I meant “when we added the endpoint to load
>> a VIEW, metadata-location was correctly marked as required."
>>
>> On Thu, Feb 29, 2024 at 4:18 PM Ryan Blue <b...@tabular.io> wrote:
>>
>>> Once again, I’m catching up late and might have a helpful perspective.
>>>
>>> I think there was a mistake in the OpenAPI spec for loading tables and
>>> the metadata-location is not listed as required. I don’t recall that
>>> being intentional, but maybe it was? Maybe for a different reason? Either
>>> way, when we added the endpoint to load a catalog, metadata-location
>>> was correctly marked as required.
>>>
>>> Whatever the reason for the field being optional, *the intent was never
>>> to change requirements from Iceberg* that metadata is written to files
>>> and atomic operations guarantee a linear history.
>>>
>>> I’m glad to clear up the confusion on that. Right now, *catalogs must
>>> write metadata files for Iceberg tables and should guarantee a linear
>>> history*.
>>>
>>> You may be able to get away with bending those rules (what Dan refers to
>>> as not compliant), but that’s unintentional. We may also choose to relax
>>> the requirement for metadata files in the future — I see support for the
>>> idea and have considered proposing it also. But for now, it’s a
>>> requirement, even if you don’t have to send the location to the client
>>> (though note that the client has a hard dependency on it!).
>>>
>>> Ryan
>>>
>>> On Thu, Feb 29, 2024 at 4:06 PM Daniel Weeks <daniel.c.we...@gmail.com>
>>> wrote:
>>>
>>>> 1. I agree, this is what the spec currently requires
>>>>
>>>> 2. I agree, it's up for consideration
>>>>
>>>> 3. I agree, I think if an implementation didn't adhere to the current
>>>> spec requirements, I would say it's out of spec (not sure I'd go as far as
>>>> to say it's a different kind of table entirely).
>>>>
>>>> Just to expand on #3, we will find lots of cases where implementations
>>>> deviate (likely unintentionally) from the rest/table spec and I feel like
>>>> the goal is to identify those cases and steer them back into compliance
>>>> with the spec as opposed to immediately claiming they're something entirely
>>>> different.  The overarching goal is to improve openness and
>>>> interoperability.
>>>>
>>>> My main point is that there isn't an inherent incompatibility between
>>>> the REST spec and the Iceberg spec.  The preservation of the storage
>>>> representation was discussed and intentional during the design/development
>>>> of the REST spec.
>>>>
>>>> -Dan
>>>>
>>>>
>>>> On Thu, Feb 29, 2024 at 3:40 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>>
>>>>> > For example, I cannot validate the atomic behaviors Glue claims, but
>>>>> I wouldn't assert that it is non-compliant because of that.
>>>>>
>>>>> I think these are not comparable claims because the API scope is
>>>>> completely different, but I don't think it's worth arguing in depth. Let's
>>>>> try to see if we can have some consensus.
>>>>>
>>>>> Based on what you said above, do you agree with the following 3 points?
>>>>>
>>>>> 1. Today, a table/view in any catalog including a REST spec-compatible
>>>>> catalog is an Iceberg table/view if and only if it points to a JSON
>>>>> metadata file in storage. This concept is a part of the Iceberg table/view
>>>>> spec. There is a debate to be had for if we want to remove this 
>>>>> requirement
>>>>> or not. The argument for it (as Yufei said) is to use other storage for
>>>>> better performance. The argument against it (as Amogh said) is to keep
>>>>> Iceberg open source friendly through the JSON format.
>>>>>
>>>>> 2. Today, a table/view in any catalog including a REST spec-compatible
>>>>> catalog is an Iceberg table/view if and only if it behind the scene
>>>>> performs the atomic metadata file swap for every commit. This concept is a
>>>>> part of the Iceberg table/view spec. We should consider removing this
>>>>> requirement in the Iceberg table/view spec.
>>>>>
>>>>> 3. A table/view in an Iceberg REST spec-compatible catalog may or may
>>>>> not be an Iceberg table/view. The REST spec does not enforce this, and 
>>>>> this
>>>>> stance will remain true going forward. For example, it could use the
>>>>> Iceberg table/view metadata structure but does not store the metadata in
>>>>> JSON file, or not use the metadata file swap commit procedure, or both, 
>>>>> and
>>>>> in those cases it is not an Iceberg table/view. More extremely, it might 
>>>>> be
>>>>> a totally different kind of table that is only surfaced through the REST
>>>>> models.
>>>>>
>>>>> -Jack
>>>>>
>>>>> On Thu, Feb 29, 2024 at 2:13 PM Daniel Weeks <daniel.c.we...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> > In that case are tables in a REST-compliant catalog still an
>>>>>> Iceberg table? I don't think so, because it is a table that only 
>>>>>> partially
>>>>>> follows the Iceberg table spec.
>>>>>>
>>>>>> If the catalog is REST compliant and complies with the Iceberg spec,
>>>>>> they are still Iceberg tables.  I can see there is an argument that if 
>>>>>> the
>>>>>> catalog is REST compliant but does not follow the commit requirements (or
>>>>>> aspects of the Iceberg spec), that you cannot call those Iceberg tables.
>>>>>> But the assertion that Iceberg tables in a REST catalog are de facto
>>>>>> non-compliant is incorrect.
>>>>>>
>>>>>> > I like the idea about validation for format compliance. But don't
>>>>>> think you can technically validate this. You can validate the static 
>>>>>> table
>>>>>> to see if it has all the Iceberg metadata components, but you can not
>>>>>> validate the internal behavior of the service during a commit to see if 
>>>>>> it
>>>>>> really atomically swapped a metadata file.
>>>>>>
>>>>>> Just because you cannot see/validate the implementation doesn't mean
>>>>>> that it is non-compliant.  For example, I cannot validate the atomic
>>>>>> behaviors Glue claims, but I wouldn't assert that it is non-compliant
>>>>>> because of that.
>>>>>>
>>>>>> I do think there is a discussion to be had about if/when we might
>>>>>> adjust the storage/swap requirements, but to reinforce Amogh's point,
>>>>>> removing those requirements would impact the openness and accessibility 
>>>>>> of
>>>>>> Iceberg, which I feel would hamper adoption.
>>>>>>
>>>>>> -Dan
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Feb 29, 2024 at 1:53 PM Yufei Gu <flyrain...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> We've periodically discussed removing the storage requirement and I
>>>>>>>> think there's a path forward to do that and would agree that 
>>>>>>>> standardizing
>>>>>>>> on REST, but I wouldn't say the justification for making this push is 
>>>>>>>> that
>>>>>>>> REST is not compliant so we can just ignore the table spec 
>>>>>>>> requirements.
>>>>>>>> There are a few more things to consider, which is that not
>>>>>>>> everything can use REST currently and making a hard cut away from file
>>>>>>>> based metadata could bifurcate access to Iceberg data.  There are also
>>>>>>>> aspects to the spec that reference the metadata paths (like metadata 
>>>>>>>> log,
>>>>>>>> though it's optional), but would likely need to be addressed.
>>>>>>>
>>>>>>>
>>>>>>> This is a bit off-topic. It makes sense to me to remove the storage
>>>>>>> requirement moving foward. The metadata.json file isn't necessary in the
>>>>>>> Rest catalog. For example, the rest catalog may not have the permission 
>>>>>>> to
>>>>>>> write to the table owner's storage. It still can save it as a file of
>>>>>>> course, but doesn't quite make sense. Putting it in a key-value store or
>>>>>>> RDMS could be a better option.
>>>>>>>
>>>>>>> Given that we are going to remove the storage requirement. Should we
>>>>>>> avoid the file path in the current design for things like view spec? A
>>>>>>> solution like table identifier + version uuid may serve the purpose.
>>>>>>>
>>>>>>> Yufei
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Feb 29, 2024 at 1:29 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>>>>>
>>>>>>>> > There's no exemption that says if you're using REST you don't
>>>>>>>> need to follow the spec.  Why do you think that's the case?
>>>>>>>>
>>>>>>>> In that case are tables in a REST-compliant catalog still an
>>>>>>>> Iceberg table? I don't think so, because it is a table that only 
>>>>>>>> partially
>>>>>>>> follows the Iceberg table spec.
>>>>>>>>
>>>>>>>> I like the idea about validation for format compliance. But don't
>>>>>>>> think you can technically validate this. You can validate the static 
>>>>>>>> table
>>>>>>>> to see if it has all the Iceberg metadata components, but you can not
>>>>>>>> validate the internal behavior of the service during a commit to see 
>>>>>>>> if it
>>>>>>>> really atomically swapped a metadata file.
>>>>>>>>
>>>>>>>> So I think at minimum we should update the table/view spec to
>>>>>>>> remove the metadata file swap requirement. The Iceberg table/view spec
>>>>>>>> should be a pure format spec that specifies how the file is laid out in
>>>>>>>> storage.
>>>>>>>>
>>>>>>>> -Jack
>>>>>>>>
>>>>>>>> On Thu, Feb 29, 2024 at 1:22 PM Amogh Jahagirdar <am...@tabular.io>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I want to echo Dan's point that just because there is a separate
>>>>>>>>> spec for a REST Catalog does not mean that implementations can 
>>>>>>>>> deviate from
>>>>>>>>> the spec's definition of the commit protocol or metadata layout, and 
>>>>>>>>> still
>>>>>>>>> be considered "spec compliant".
>>>>>>>>>
>>>>>>>>> > Secondly, once we do that, we should declare REST spec as the
>>>>>>>>> official catalog spec to interact with Iceberg tables. Otherwise at 
>>>>>>>>> least I
>>>>>>>>> will be very tempted to just break the atomic pointer swap pattern and
>>>>>>>>> store the entire metadata using the Glue Table object to achieve much
>>>>>>>>> better performance and also Glue native feature integrations, and I 
>>>>>>>>> think
>>>>>>>>> other players will be equally motivated to do something similar. That 
>>>>>>>>> will
>>>>>>>>> lead to even more chaos in the Iceberg catalog space.
>>>>>>>>>
>>>>>>>>> On this, a second point I want to make is around the openness of
>>>>>>>>> this ecosystem. We all already know that openness (the file formats, 
>>>>>>>>> the
>>>>>>>>> metadata layout, the spec itself) is a fundamental tenant of the 
>>>>>>>>> project.
>>>>>>>>> If we take the provided example of removing the metadata JSON file and
>>>>>>>>> moving it to some other storage, I think that goes against this 
>>>>>>>>> principle
>>>>>>>>> since a JSON file is quite open by definition. Going back to the first
>>>>>>>>> point, I think a catalog which has such a behavior would *not* be
>>>>>>>>> considered spec compliant. Another reason this is important is if we 
>>>>>>>>> think
>>>>>>>>> about what's healthiest for all users of Iceberg, is to have a 
>>>>>>>>> healthy list
>>>>>>>>> of options for catalog choices. Storing the metadata JSON in non-open 
>>>>>>>>> ways
>>>>>>>>> can make users lives harder for trying out new catalogs since now the
>>>>>>>>> metadata would be stored in their own way, and the users will have a 
>>>>>>>>> harder
>>>>>>>>> time accessing their own data.
>>>>>>>>>
>>>>>>>>> A last point I'd like to make is I think there's a good discussion
>>>>>>>>> to be had on how do we validate that a REST Catalog implementation is 
>>>>>>>>> spec
>>>>>>>>> compliant. I think that's really beneficial for the ecosystem as a 
>>>>>>>>> whole.
>>>>>>>>> Before that, I think first though we'd want to conclude on this topic
>>>>>>>>> itself.
>>>>>>>>>
>>>>>>>>> On Thu, Feb 29, 2024 at 12:29 PM Daniel Weeks <
>>>>>>>>> daniel.c.we...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> > REST spec-compliant catalog does not need to follow the Iceberg
>>>>>>>>>> spec to commit or store metadata
>>>>>>>>>>
>>>>>>>>>> If the REST implementation doesn't follow the Iceberg spec for
>>>>>>>>>> commit requirements, it's not compliant with the spec.  There's no
>>>>>>>>>> exemption that says if you're using REST you don't need to follow the
>>>>>>>>>> spec.  Why do you think that's the case?
>>>>>>>>>>
>>>>>>>>>> I don't believe there's a reason to say that the REST spec needs
>>>>>>>>>> to enforce the commit requirements either, that's a requirement of 
>>>>>>>>>> the
>>>>>>>>>> Iceberg spec and still needs to be complied with.
>>>>>>>>>>
>>>>>>>>>> -Dan
>>>>>>>>>>
>>>>>>>>>> On Thu, Feb 29, 2024 at 12:19 PM Jack Ye <yezhao...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> > The implementation of the spec can either be compliant or not.
>>>>>>>>>>>
>>>>>>>>>>> This is exactly the problem we are talking about right? Just to
>>>>>>>>>>> give an example, we cannot technically say that tables/views in the 
>>>>>>>>>>> Tabular
>>>>>>>>>>> catalog are Iceberg tables/views, because a REST spec-compliant 
>>>>>>>>>>> catalog
>>>>>>>>>>> does not need to follow the Iceberg spec to commit or store 
>>>>>>>>>>> metadata. Even
>>>>>>>>>>> if you say it is, there is no way to really prove that, because the 
>>>>>>>>>>> REST
>>>>>>>>>>> spec does not enforce it.
>>>>>>>>>>>
>>>>>>>>>>> JB, what do you mean by participating on the Catalog RFC? Is
>>>>>>>>>>> there already an ongoing RFC?
>>>>>>>>>>>
>>>>>>>>>>> -Jack
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Feb 29, 2024 at 12:08 PM Jean-Baptiste Onofré <
>>>>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Dan,
>>>>>>>>>>>>
>>>>>>>>>>>> I agree with your statement about REST Spec is not an implement
>>>>>>>>>>>> but I strongly disagree with your statement "impl of the spec can 
>>>>>>>>>>>> either be
>>>>>>>>>>>> compliant or not".
>>>>>>>>>>>>
>>>>>>>>>>>> The REST Catalog spec impl should be consistent with the REST
>>>>>>>>>>>> Spec. That's why a reference implementation in Iceberg would be a 
>>>>>>>>>>>> must,
>>>>>>>>>>>> with a TCK.
>>>>>>>>>>>>
>>>>>>>>>>>> The REST Spec should bridge/give access to Table/View metadata.
>>>>>>>>>>>> I think it would make sense to have a resource to GET the 
>>>>>>>>>>>> Table/View
>>>>>>>>>>>> metadata, also supporting PUT to update.
>>>>>>>>>>>> JSON Schema and eventually JSON RPC could help on some area
>>>>>>>>>>>> here (compliant with OpenAPI).
>>>>>>>>>>>>
>>>>>>>>>>>> In another thread, I propose to work on a Catalog RFC, exactly
>>>>>>>>>>>> to target this. I think it would make sense to have the 
>>>>>>>>>>>> REST/Catalog RFC as
>>>>>>>>>>>> the main catalog API, so it has to be both consistent (giving 
>>>>>>>>>>>> access to
>>>>>>>>>>>> table/view metadata) and extensible (via OpenAPI Extensions for 
>>>>>>>>>>>> instance).
>>>>>>>>>>>>
>>>>>>>>>>>> So, I agree with Jack: the minimum would be to have JSON
>>>>>>>>>>>> metadata exposed by the REST Spec.
>>>>>>>>>>>>
>>>>>>>>>>>> @Jack, short term I'm in favor of your proposal, long term, I
>>>>>>>>>>>> propose to participate on the Catalog RFC (REST Spec). WDYT ?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks !
>>>>>>>>>>>> Regards
>>>>>>>>>>>> JB
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Le jeu. 29 févr. 2024 à 20:47, Daniel Weeks <
>>>>>>>>>>>> daniel.c.we...@gmail.com> a écrit :
>>>>>>>>>>>>
>>>>>>>>>>>>> Hey Jack,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm not sure I agree with the framing of this argument.  The
>>>>>>>>>>>>> REST Spec defines a protocol, not an implementation.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The implementation of the spec can either be compliant or
>>>>>>>>>>>>> not.  So a REST Implementation that adheres to all the 
>>>>>>>>>>>>> requirements (atomic
>>>>>>>>>>>>> location swap, json representation, etc.), would be compliant.  
>>>>>>>>>>>>> There's no
>>>>>>>>>>>>> requirement around who performs these operations and with REST, 
>>>>>>>>>>>>> that is
>>>>>>>>>>>>> delegated to the server.  The optional metadata location doesn't 
>>>>>>>>>>>>> mean that
>>>>>>>>>>>>> there isn't a metadata location, just that it may not be exposed 
>>>>>>>>>>>>> directly
>>>>>>>>>>>>> in the response.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Therefore, an implementation where you just store the table
>>>>>>>>>>>>> metadata in a Glue Table object, would not be compliant, 
>>>>>>>>>>>>> currently.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We've periodically discussed removing the storage requirement
>>>>>>>>>>>>> and I think there's a path forward to do that and would agree that
>>>>>>>>>>>>> standardizing on REST, but I wouldn't say the justification for 
>>>>>>>>>>>>> making this
>>>>>>>>>>>>> push is that REST is not compliant so we can just ignore the 
>>>>>>>>>>>>> table spec
>>>>>>>>>>>>> requirements.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There are a few more things to consider, which is that not
>>>>>>>>>>>>> everything can use REST currently and making a hard cut away from 
>>>>>>>>>>>>> file
>>>>>>>>>>>>> based metadata could bifurcate access to Iceberg data.  There are 
>>>>>>>>>>>>> also
>>>>>>>>>>>>> aspects to the spec that reference the metadata paths (like 
>>>>>>>>>>>>> metadata log,
>>>>>>>>>>>>> though it's optional), but would likely need to be addressed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Dan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Feb 29, 2024 at 11:13 AM Jack Ye <yezhao...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Just want to pull this specific topic out of the materialized
>>>>>>>>>>>>>> view discussion thread. I noticed this during the MV discussion, 
>>>>>>>>>>>>>> and I
>>>>>>>>>>>>>> think it is important to clarify this not just for the MV topic, 
>>>>>>>>>>>>>> but also
>>>>>>>>>>>>>> for the ongoing discussion to consolidate all the different 
>>>>>>>>>>>>>> catalogs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *How the table/view spec defines Iceberg table/view*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If we look into the table/view spec, the optimistic
>>>>>>>>>>>>>> concurrency section
>>>>>>>>>>>>>> <https://iceberg.apache.org/spec/#optimistic-concurrency>
>>>>>>>>>>>>>> requires the existence of a metadata file, and the atomic swap 
>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>> metadata file ensures serializable isolation. This implies 2 
>>>>>>>>>>>>>> things:
>>>>>>>>>>>>>> 1. the metadata file in a storage that holds the information
>>>>>>>>>>>>>> described in the rest of the spec.
>>>>>>>>>>>>>> 2. there is an object in a catalog that holds the pointer of
>>>>>>>>>>>>>> the metadata file. What object and what catalog is implementation
>>>>>>>>>>>>>> dependent, but these generalized concepts are always intact.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The JSON serialization parts of the spec plus the reader
>>>>>>>>>>>>>> requirements also implies that the metadata file is in JSON 
>>>>>>>>>>>>>> format.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So when we talk about an Iceberg table/view that is compliant
>>>>>>>>>>>>>> with the spec, it is the combination of all these 5 requirements:
>>>>>>>>>>>>>> 1. there is an object in the catalog representing this
>>>>>>>>>>>>>> table/view
>>>>>>>>>>>>>> 2. there is a pointer to a JSON metadata file in the object
>>>>>>>>>>>>>> 3. the JSON metadata file exists in storage and contains the
>>>>>>>>>>>>>> table/view metadata content
>>>>>>>>>>>>>> 4. the metadata content is compliant with the standard
>>>>>>>>>>>>>> described in the spec
>>>>>>>>>>>>>> 5. serializable isolation is achieved by atomic swap of the
>>>>>>>>>>>>>> object pointer
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *How non-REST catalogs are compliant with the table/view spec*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> An implementation of the Iceberg table/view is essentially
>>>>>>>>>>>>>> specifying:
>>>>>>>>>>>>>> 1. what is the exact implementation of the catalog, e.g.
>>>>>>>>>>>>>> JDBC, Hive metastore (HMS), Glue, etc.
>>>>>>>>>>>>>> 2. what is the object that represents a table, e.g. a row in
>>>>>>>>>>>>>> the "iceberg_tables" table in JDBC, a Table object in HMS/Glue, 
>>>>>>>>>>>>>> etc.
>>>>>>>>>>>>>> 3. how is the JSON metadata file pointer stored, e.g. a
>>>>>>>>>>>>>> column in the table's row in JDBC, metadata_location key in the 
>>>>>>>>>>>>>> Table's
>>>>>>>>>>>>>> parameter map in HMS/Glue, etc.
>>>>>>>>>>>>>> 4. how the atomic swap is implemented, e.g. SQL atomic update
>>>>>>>>>>>>>> in JDBC, conditional parameter update in HMS, conditional 
>>>>>>>>>>>>>> version update in
>>>>>>>>>>>>>> Glue, etc.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *How the REST spec is NOT compliant with the table/view spec*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The REST spec technically does not match the following
>>>>>>>>>>>>>> table/view spec requirements:
>>>>>>>>>>>>>> 2. there is a pointer to a JSON metadata file in the object
>>>>>>>>>>>>>> 3. the JSON metadata file exists in storage and contains the
>>>>>>>>>>>>>> table/view metadata content
>>>>>>>>>>>>>> 5. serializable isolation is achieved by atomic swap of the
>>>>>>>>>>>>>> object pointer
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The key parts in REST spec that are not compliant are:
>>>>>>>>>>>>>> 1. metadata-location field is optional in LoadTableResponse
>>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L2721-L2728>
>>>>>>>>>>>>>> 2. pointer swap is not enforced in the UpdateTable
>>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L658>
>>>>>>>>>>>>>> operation
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Therefore, it opens the door for a REST service to be
>>>>>>>>>>>>>> completely not dependent on a JSON metadata file, store the 
>>>>>>>>>>>>>> Iceberg
>>>>>>>>>>>>>> table/view metadata not as a file, and achieve much better 
>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>> characteristics than other catalogs. This technically gives a 
>>>>>>>>>>>>>> unique
>>>>>>>>>>>>>> advantage for REST catalog adopters that is not there for 
>>>>>>>>>>>>>> non-REST catalogs
>>>>>>>>>>>>>> like HMS and Glue.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *How can we fix this?*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I suggest the following:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Firstly, I think it is good that we try to remove the
>>>>>>>>>>>>>> requirements of JSON metadata file pointer and atomic pointer 
>>>>>>>>>>>>>> swap. We know
>>>>>>>>>>>>>> these requirements have perf limitations based on production 
>>>>>>>>>>>>>> usage,
>>>>>>>>>>>>>> especially when the metadata file is large. If that is the 
>>>>>>>>>>>>>> direction, we
>>>>>>>>>>>>>> should make it official by changing the table/view spec to say 
>>>>>>>>>>>>>> that those
>>>>>>>>>>>>>> requirements are catalog level implementation details that are 
>>>>>>>>>>>>>> no longer
>>>>>>>>>>>>>> required.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Secondly, once we do that, we should declare REST spec as the
>>>>>>>>>>>>>> official catalog spec to interact with Iceberg tables. Otherwise 
>>>>>>>>>>>>>> at least I
>>>>>>>>>>>>>> will be very tempted to just break the atomic pointer swap 
>>>>>>>>>>>>>> pattern and
>>>>>>>>>>>>>> store the entire metadata using the Glue Table object to achieve 
>>>>>>>>>>>>>> much
>>>>>>>>>>>>>> better performance and also Glue native feature integrations, 
>>>>>>>>>>>>>> and I think
>>>>>>>>>>>>>> other players will be equally motivated to do something similar. 
>>>>>>>>>>>>>> That will
>>>>>>>>>>>>>> lead to even more chaos in the Iceberg catalog space.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With REST spec as the official catalog spec, we can actually
>>>>>>>>>>>>>> support non-REST catalogs by using the HTTP execution chain 
>>>>>>>>>>>>>> handler. Dan
>>>>>>>>>>>>>> has already done a prototype here
>>>>>>>>>>>>>> <https://github.com/apache/iceberg/commit/619127ff69f89e43a1edef2ea94c3dd439396a8d#diff-869264a83ba9ca657e7defefaa16ad196b0de9fce6c87f97533db77f29e44762>
>>>>>>>>>>>>>> that is based on this discussion
>>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/8091#issuecomment-1647189146>
>>>>>>>>>>>>>> in the past about using AWS Lambda as an alternative HTTP client 
>>>>>>>>>>>>>> for REST
>>>>>>>>>>>>>> catalog. The same approach can be used to talk to 
>>>>>>>>>>>>>> HMS/Glue/JDBC/... while
>>>>>>>>>>>>>> users will only interact with the RESTCatalog as the entry point.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think this can provide a good path forward overall for the
>>>>>>>>>>>>>> catalog consolidation story, interested to know what others 
>>>>>>>>>>>>>> think.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Jack Ye
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

-- 
Ryan Blue
Tabular

Reply via email to