I did not notice the difference between table and view. Should we change that for tables then?
It depends on what we consider a breaking change at this point. Plus, we may want it to be optional in the future. My main point, though, is that I wouldn’t read too much into it being optional. I think we all have the same expectations for REST services today — that they need to follow both the Iceberg table spec and the REST spec. I would treat findings like this as an opportunity to make the specs more clear about requirements. Ryan On Thu, Feb 29, 2024 at 4:28 PM Jack Ye <yezhao...@gmail.com> wrote: > > I feel like the goal is to identify those cases and steer them back into > compliance with the spec > > +100000 > > > as opposed to immediately claiming they're something entirely different > > In case this comment is talking about my last sentence "More extremely, it > might be a totally different kind of table that is only surfaced through > the REST models." I don't mean the Iceberg tables/view in a REST compatible > catalog are entirely different. I mean there is another more extreme use > case, where the REST catalog can surface other non-Iceberg tables (e.g. > Hive Parquet tables) through the same REST model, which is a use case I > mentioned previously that is an interesting application of REST that we see > some users are interested in. Which is also why the metadata location is > important to guide those use cases. > > > when we added the endpoint to load a VIEW, metadata-location was > correctly marked as required > > hmmm interesting, you are right, I did not notice the difference between > table and view. Should we change that for tables then? > > -Jack > > On Thu, Feb 29, 2024 at 4:20 PM Ryan Blue <b...@tabular.io> wrote: > >> Oops. In the first paragraph, I meant “when we added the endpoint to load >> a VIEW, metadata-location was correctly marked as required." >> >> On Thu, Feb 29, 2024 at 4:18 PM Ryan Blue <b...@tabular.io> wrote: >> >>> Once again, I’m catching up late and might have a helpful perspective. >>> >>> I think there was a mistake in the OpenAPI spec for loading tables and >>> the metadata-location is not listed as required. I don’t recall that >>> being intentional, but maybe it was? Maybe for a different reason? Either >>> way, when we added the endpoint to load a catalog, metadata-location >>> was correctly marked as required. >>> >>> Whatever the reason for the field being optional, *the intent was never >>> to change requirements from Iceberg* that metadata is written to files >>> and atomic operations guarantee a linear history. >>> >>> I’m glad to clear up the confusion on that. Right now, *catalogs must >>> write metadata files for Iceberg tables and should guarantee a linear >>> history*. >>> >>> You may be able to get away with bending those rules (what Dan refers to >>> as not compliant), but that’s unintentional. We may also choose to relax >>> the requirement for metadata files in the future — I see support for the >>> idea and have considered proposing it also. But for now, it’s a >>> requirement, even if you don’t have to send the location to the client >>> (though note that the client has a hard dependency on it!). >>> >>> Ryan >>> >>> On Thu, Feb 29, 2024 at 4:06 PM Daniel Weeks <daniel.c.we...@gmail.com> >>> wrote: >>> >>>> 1. I agree, this is what the spec currently requires >>>> >>>> 2. I agree, it's up for consideration >>>> >>>> 3. I agree, I think if an implementation didn't adhere to the current >>>> spec requirements, I would say it's out of spec (not sure I'd go as far as >>>> to say it's a different kind of table entirely). >>>> >>>> Just to expand on #3, we will find lots of cases where implementations >>>> deviate (likely unintentionally) from the rest/table spec and I feel like >>>> the goal is to identify those cases and steer them back into compliance >>>> with the spec as opposed to immediately claiming they're something entirely >>>> different. The overarching goal is to improve openness and >>>> interoperability. >>>> >>>> My main point is that there isn't an inherent incompatibility between >>>> the REST spec and the Iceberg spec. The preservation of the storage >>>> representation was discussed and intentional during the design/development >>>> of the REST spec. >>>> >>>> -Dan >>>> >>>> >>>> On Thu, Feb 29, 2024 at 3:40 PM Jack Ye <yezhao...@gmail.com> wrote: >>>> >>>>> > For example, I cannot validate the atomic behaviors Glue claims, but >>>>> I wouldn't assert that it is non-compliant because of that. >>>>> >>>>> I think these are not comparable claims because the API scope is >>>>> completely different, but I don't think it's worth arguing in depth. Let's >>>>> try to see if we can have some consensus. >>>>> >>>>> Based on what you said above, do you agree with the following 3 points? >>>>> >>>>> 1. Today, a table/view in any catalog including a REST spec-compatible >>>>> catalog is an Iceberg table/view if and only if it points to a JSON >>>>> metadata file in storage. This concept is a part of the Iceberg table/view >>>>> spec. There is a debate to be had for if we want to remove this >>>>> requirement >>>>> or not. The argument for it (as Yufei said) is to use other storage for >>>>> better performance. The argument against it (as Amogh said) is to keep >>>>> Iceberg open source friendly through the JSON format. >>>>> >>>>> 2. Today, a table/view in any catalog including a REST spec-compatible >>>>> catalog is an Iceberg table/view if and only if it behind the scene >>>>> performs the atomic metadata file swap for every commit. This concept is a >>>>> part of the Iceberg table/view spec. We should consider removing this >>>>> requirement in the Iceberg table/view spec. >>>>> >>>>> 3. A table/view in an Iceberg REST spec-compatible catalog may or may >>>>> not be an Iceberg table/view. The REST spec does not enforce this, and >>>>> this >>>>> stance will remain true going forward. For example, it could use the >>>>> Iceberg table/view metadata structure but does not store the metadata in >>>>> JSON file, or not use the metadata file swap commit procedure, or both, >>>>> and >>>>> in those cases it is not an Iceberg table/view. More extremely, it might >>>>> be >>>>> a totally different kind of table that is only surfaced through the REST >>>>> models. >>>>> >>>>> -Jack >>>>> >>>>> On Thu, Feb 29, 2024 at 2:13 PM Daniel Weeks <daniel.c.we...@gmail.com> >>>>> wrote: >>>>> >>>>>> > In that case are tables in a REST-compliant catalog still an >>>>>> Iceberg table? I don't think so, because it is a table that only >>>>>> partially >>>>>> follows the Iceberg table spec. >>>>>> >>>>>> If the catalog is REST compliant and complies with the Iceberg spec, >>>>>> they are still Iceberg tables. I can see there is an argument that if >>>>>> the >>>>>> catalog is REST compliant but does not follow the commit requirements (or >>>>>> aspects of the Iceberg spec), that you cannot call those Iceberg tables. >>>>>> But the assertion that Iceberg tables in a REST catalog are de facto >>>>>> non-compliant is incorrect. >>>>>> >>>>>> > I like the idea about validation for format compliance. But don't >>>>>> think you can technically validate this. You can validate the static >>>>>> table >>>>>> to see if it has all the Iceberg metadata components, but you can not >>>>>> validate the internal behavior of the service during a commit to see if >>>>>> it >>>>>> really atomically swapped a metadata file. >>>>>> >>>>>> Just because you cannot see/validate the implementation doesn't mean >>>>>> that it is non-compliant. For example, I cannot validate the atomic >>>>>> behaviors Glue claims, but I wouldn't assert that it is non-compliant >>>>>> because of that. >>>>>> >>>>>> I do think there is a discussion to be had about if/when we might >>>>>> adjust the storage/swap requirements, but to reinforce Amogh's point, >>>>>> removing those requirements would impact the openness and accessibility >>>>>> of >>>>>> Iceberg, which I feel would hamper adoption. >>>>>> >>>>>> -Dan >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Feb 29, 2024 at 1:53 PM Yufei Gu <flyrain...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> We've periodically discussed removing the storage requirement and I >>>>>>>> think there's a path forward to do that and would agree that >>>>>>>> standardizing >>>>>>>> on REST, but I wouldn't say the justification for making this push is >>>>>>>> that >>>>>>>> REST is not compliant so we can just ignore the table spec >>>>>>>> requirements. >>>>>>>> There are a few more things to consider, which is that not >>>>>>>> everything can use REST currently and making a hard cut away from file >>>>>>>> based metadata could bifurcate access to Iceberg data. There are also >>>>>>>> aspects to the spec that reference the metadata paths (like metadata >>>>>>>> log, >>>>>>>> though it's optional), but would likely need to be addressed. >>>>>>> >>>>>>> >>>>>>> This is a bit off-topic. It makes sense to me to remove the storage >>>>>>> requirement moving foward. The metadata.json file isn't necessary in the >>>>>>> Rest catalog. For example, the rest catalog may not have the permission >>>>>>> to >>>>>>> write to the table owner's storage. It still can save it as a file of >>>>>>> course, but doesn't quite make sense. Putting it in a key-value store or >>>>>>> RDMS could be a better option. >>>>>>> >>>>>>> Given that we are going to remove the storage requirement. Should we >>>>>>> avoid the file path in the current design for things like view spec? A >>>>>>> solution like table identifier + version uuid may serve the purpose. >>>>>>> >>>>>>> Yufei >>>>>>> >>>>>>> >>>>>>> On Thu, Feb 29, 2024 at 1:29 PM Jack Ye <yezhao...@gmail.com> wrote: >>>>>>> >>>>>>>> > There's no exemption that says if you're using REST you don't >>>>>>>> need to follow the spec. Why do you think that's the case? >>>>>>>> >>>>>>>> In that case are tables in a REST-compliant catalog still an >>>>>>>> Iceberg table? I don't think so, because it is a table that only >>>>>>>> partially >>>>>>>> follows the Iceberg table spec. >>>>>>>> >>>>>>>> I like the idea about validation for format compliance. But don't >>>>>>>> think you can technically validate this. You can validate the static >>>>>>>> table >>>>>>>> to see if it has all the Iceberg metadata components, but you can not >>>>>>>> validate the internal behavior of the service during a commit to see >>>>>>>> if it >>>>>>>> really atomically swapped a metadata file. >>>>>>>> >>>>>>>> So I think at minimum we should update the table/view spec to >>>>>>>> remove the metadata file swap requirement. The Iceberg table/view spec >>>>>>>> should be a pure format spec that specifies how the file is laid out in >>>>>>>> storage. >>>>>>>> >>>>>>>> -Jack >>>>>>>> >>>>>>>> On Thu, Feb 29, 2024 at 1:22 PM Amogh Jahagirdar <am...@tabular.io> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I want to echo Dan's point that just because there is a separate >>>>>>>>> spec for a REST Catalog does not mean that implementations can >>>>>>>>> deviate from >>>>>>>>> the spec's definition of the commit protocol or metadata layout, and >>>>>>>>> still >>>>>>>>> be considered "spec compliant". >>>>>>>>> >>>>>>>>> > Secondly, once we do that, we should declare REST spec as the >>>>>>>>> official catalog spec to interact with Iceberg tables. Otherwise at >>>>>>>>> least I >>>>>>>>> will be very tempted to just break the atomic pointer swap pattern and >>>>>>>>> store the entire metadata using the Glue Table object to achieve much >>>>>>>>> better performance and also Glue native feature integrations, and I >>>>>>>>> think >>>>>>>>> other players will be equally motivated to do something similar. That >>>>>>>>> will >>>>>>>>> lead to even more chaos in the Iceberg catalog space. >>>>>>>>> >>>>>>>>> On this, a second point I want to make is around the openness of >>>>>>>>> this ecosystem. We all already know that openness (the file formats, >>>>>>>>> the >>>>>>>>> metadata layout, the spec itself) is a fundamental tenant of the >>>>>>>>> project. >>>>>>>>> If we take the provided example of removing the metadata JSON file and >>>>>>>>> moving it to some other storage, I think that goes against this >>>>>>>>> principle >>>>>>>>> since a JSON file is quite open by definition. Going back to the first >>>>>>>>> point, I think a catalog which has such a behavior would *not* be >>>>>>>>> considered spec compliant. Another reason this is important is if we >>>>>>>>> think >>>>>>>>> about what's healthiest for all users of Iceberg, is to have a >>>>>>>>> healthy list >>>>>>>>> of options for catalog choices. Storing the metadata JSON in non-open >>>>>>>>> ways >>>>>>>>> can make users lives harder for trying out new catalogs since now the >>>>>>>>> metadata would be stored in their own way, and the users will have a >>>>>>>>> harder >>>>>>>>> time accessing their own data. >>>>>>>>> >>>>>>>>> A last point I'd like to make is I think there's a good discussion >>>>>>>>> to be had on how do we validate that a REST Catalog implementation is >>>>>>>>> spec >>>>>>>>> compliant. I think that's really beneficial for the ecosystem as a >>>>>>>>> whole. >>>>>>>>> Before that, I think first though we'd want to conclude on this topic >>>>>>>>> itself. >>>>>>>>> >>>>>>>>> On Thu, Feb 29, 2024 at 12:29 PM Daniel Weeks < >>>>>>>>> daniel.c.we...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> > REST spec-compliant catalog does not need to follow the Iceberg >>>>>>>>>> spec to commit or store metadata >>>>>>>>>> >>>>>>>>>> If the REST implementation doesn't follow the Iceberg spec for >>>>>>>>>> commit requirements, it's not compliant with the spec. There's no >>>>>>>>>> exemption that says if you're using REST you don't need to follow the >>>>>>>>>> spec. Why do you think that's the case? >>>>>>>>>> >>>>>>>>>> I don't believe there's a reason to say that the REST spec needs >>>>>>>>>> to enforce the commit requirements either, that's a requirement of >>>>>>>>>> the >>>>>>>>>> Iceberg spec and still needs to be complied with. >>>>>>>>>> >>>>>>>>>> -Dan >>>>>>>>>> >>>>>>>>>> On Thu, Feb 29, 2024 at 12:19 PM Jack Ye <yezhao...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> > The implementation of the spec can either be compliant or not. >>>>>>>>>>> >>>>>>>>>>> This is exactly the problem we are talking about right? Just to >>>>>>>>>>> give an example, we cannot technically say that tables/views in the >>>>>>>>>>> Tabular >>>>>>>>>>> catalog are Iceberg tables/views, because a REST spec-compliant >>>>>>>>>>> catalog >>>>>>>>>>> does not need to follow the Iceberg spec to commit or store >>>>>>>>>>> metadata. Even >>>>>>>>>>> if you say it is, there is no way to really prove that, because the >>>>>>>>>>> REST >>>>>>>>>>> spec does not enforce it. >>>>>>>>>>> >>>>>>>>>>> JB, what do you mean by participating on the Catalog RFC? Is >>>>>>>>>>> there already an ongoing RFC? >>>>>>>>>>> >>>>>>>>>>> -Jack >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Feb 29, 2024 at 12:08 PM Jean-Baptiste Onofré < >>>>>>>>>>> j...@nanthrax.net> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Dan, >>>>>>>>>>>> >>>>>>>>>>>> I agree with your statement about REST Spec is not an implement >>>>>>>>>>>> but I strongly disagree with your statement "impl of the spec can >>>>>>>>>>>> either be >>>>>>>>>>>> compliant or not". >>>>>>>>>>>> >>>>>>>>>>>> The REST Catalog spec impl should be consistent with the REST >>>>>>>>>>>> Spec. That's why a reference implementation in Iceberg would be a >>>>>>>>>>>> must, >>>>>>>>>>>> with a TCK. >>>>>>>>>>>> >>>>>>>>>>>> The REST Spec should bridge/give access to Table/View metadata. >>>>>>>>>>>> I think it would make sense to have a resource to GET the >>>>>>>>>>>> Table/View >>>>>>>>>>>> metadata, also supporting PUT to update. >>>>>>>>>>>> JSON Schema and eventually JSON RPC could help on some area >>>>>>>>>>>> here (compliant with OpenAPI). >>>>>>>>>>>> >>>>>>>>>>>> In another thread, I propose to work on a Catalog RFC, exactly >>>>>>>>>>>> to target this. I think it would make sense to have the >>>>>>>>>>>> REST/Catalog RFC as >>>>>>>>>>>> the main catalog API, so it has to be both consistent (giving >>>>>>>>>>>> access to >>>>>>>>>>>> table/view metadata) and extensible (via OpenAPI Extensions for >>>>>>>>>>>> instance). >>>>>>>>>>>> >>>>>>>>>>>> So, I agree with Jack: the minimum would be to have JSON >>>>>>>>>>>> metadata exposed by the REST Spec. >>>>>>>>>>>> >>>>>>>>>>>> @Jack, short term I'm in favor of your proposal, long term, I >>>>>>>>>>>> propose to participate on the Catalog RFC (REST Spec). WDYT ? >>>>>>>>>>>> >>>>>>>>>>>> Thanks ! >>>>>>>>>>>> Regards >>>>>>>>>>>> JB >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Le jeu. 29 févr. 2024 à 20:47, Daniel Weeks < >>>>>>>>>>>> daniel.c.we...@gmail.com> a écrit : >>>>>>>>>>>> >>>>>>>>>>>>> Hey Jack, >>>>>>>>>>>>> >>>>>>>>>>>>> I'm not sure I agree with the framing of this argument. The >>>>>>>>>>>>> REST Spec defines a protocol, not an implementation. >>>>>>>>>>>>> >>>>>>>>>>>>> The implementation of the spec can either be compliant or >>>>>>>>>>>>> not. So a REST Implementation that adheres to all the >>>>>>>>>>>>> requirements (atomic >>>>>>>>>>>>> location swap, json representation, etc.), would be compliant. >>>>>>>>>>>>> There's no >>>>>>>>>>>>> requirement around who performs these operations and with REST, >>>>>>>>>>>>> that is >>>>>>>>>>>>> delegated to the server. The optional metadata location doesn't >>>>>>>>>>>>> mean that >>>>>>>>>>>>> there isn't a metadata location, just that it may not be exposed >>>>>>>>>>>>> directly >>>>>>>>>>>>> in the response. >>>>>>>>>>>>> >>>>>>>>>>>>> Therefore, an implementation where you just store the table >>>>>>>>>>>>> metadata in a Glue Table object, would not be compliant, >>>>>>>>>>>>> currently. >>>>>>>>>>>>> >>>>>>>>>>>>> We've periodically discussed removing the storage requirement >>>>>>>>>>>>> and I think there's a path forward to do that and would agree that >>>>>>>>>>>>> standardizing on REST, but I wouldn't say the justification for >>>>>>>>>>>>> making this >>>>>>>>>>>>> push is that REST is not compliant so we can just ignore the >>>>>>>>>>>>> table spec >>>>>>>>>>>>> requirements. >>>>>>>>>>>>> >>>>>>>>>>>>> There are a few more things to consider, which is that not >>>>>>>>>>>>> everything can use REST currently and making a hard cut away from >>>>>>>>>>>>> file >>>>>>>>>>>>> based metadata could bifurcate access to Iceberg data. There are >>>>>>>>>>>>> also >>>>>>>>>>>>> aspects to the spec that reference the metadata paths (like >>>>>>>>>>>>> metadata log, >>>>>>>>>>>>> though it's optional), but would likely need to be addressed. >>>>>>>>>>>>> >>>>>>>>>>>>> -Dan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Feb 29, 2024 at 11:13 AM Jack Ye <yezhao...@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Just want to pull this specific topic out of the materialized >>>>>>>>>>>>>> view discussion thread. I noticed this during the MV discussion, >>>>>>>>>>>>>> and I >>>>>>>>>>>>>> think it is important to clarify this not just for the MV topic, >>>>>>>>>>>>>> but also >>>>>>>>>>>>>> for the ongoing discussion to consolidate all the different >>>>>>>>>>>>>> catalogs. >>>>>>>>>>>>>> >>>>>>>>>>>>>> *How the table/view spec defines Iceberg table/view* >>>>>>>>>>>>>> >>>>>>>>>>>>>> If we look into the table/view spec, the optimistic >>>>>>>>>>>>>> concurrency section >>>>>>>>>>>>>> <https://iceberg.apache.org/spec/#optimistic-concurrency> >>>>>>>>>>>>>> requires the existence of a metadata file, and the atomic swap >>>>>>>>>>>>>> of the >>>>>>>>>>>>>> metadata file ensures serializable isolation. This implies 2 >>>>>>>>>>>>>> things: >>>>>>>>>>>>>> 1. the metadata file in a storage that holds the information >>>>>>>>>>>>>> described in the rest of the spec. >>>>>>>>>>>>>> 2. there is an object in a catalog that holds the pointer of >>>>>>>>>>>>>> the metadata file. What object and what catalog is implementation >>>>>>>>>>>>>> dependent, but these generalized concepts are always intact. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The JSON serialization parts of the spec plus the reader >>>>>>>>>>>>>> requirements also implies that the metadata file is in JSON >>>>>>>>>>>>>> format. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So when we talk about an Iceberg table/view that is compliant >>>>>>>>>>>>>> with the spec, it is the combination of all these 5 requirements: >>>>>>>>>>>>>> 1. there is an object in the catalog representing this >>>>>>>>>>>>>> table/view >>>>>>>>>>>>>> 2. there is a pointer to a JSON metadata file in the object >>>>>>>>>>>>>> 3. the JSON metadata file exists in storage and contains the >>>>>>>>>>>>>> table/view metadata content >>>>>>>>>>>>>> 4. the metadata content is compliant with the standard >>>>>>>>>>>>>> described in the spec >>>>>>>>>>>>>> 5. serializable isolation is achieved by atomic swap of the >>>>>>>>>>>>>> object pointer >>>>>>>>>>>>>> >>>>>>>>>>>>>> *How non-REST catalogs are compliant with the table/view spec* >>>>>>>>>>>>>> >>>>>>>>>>>>>> An implementation of the Iceberg table/view is essentially >>>>>>>>>>>>>> specifying: >>>>>>>>>>>>>> 1. what is the exact implementation of the catalog, e.g. >>>>>>>>>>>>>> JDBC, Hive metastore (HMS), Glue, etc. >>>>>>>>>>>>>> 2. what is the object that represents a table, e.g. a row in >>>>>>>>>>>>>> the "iceberg_tables" table in JDBC, a Table object in HMS/Glue, >>>>>>>>>>>>>> etc. >>>>>>>>>>>>>> 3. how is the JSON metadata file pointer stored, e.g. a >>>>>>>>>>>>>> column in the table's row in JDBC, metadata_location key in the >>>>>>>>>>>>>> Table's >>>>>>>>>>>>>> parameter map in HMS/Glue, etc. >>>>>>>>>>>>>> 4. how the atomic swap is implemented, e.g. SQL atomic update >>>>>>>>>>>>>> in JDBC, conditional parameter update in HMS, conditional >>>>>>>>>>>>>> version update in >>>>>>>>>>>>>> Glue, etc. >>>>>>>>>>>>>> >>>>>>>>>>>>>> *How the REST spec is NOT compliant with the table/view spec* >>>>>>>>>>>>>> >>>>>>>>>>>>>> The REST spec technically does not match the following >>>>>>>>>>>>>> table/view spec requirements: >>>>>>>>>>>>>> 2. there is a pointer to a JSON metadata file in the object >>>>>>>>>>>>>> 3. the JSON metadata file exists in storage and contains the >>>>>>>>>>>>>> table/view metadata content >>>>>>>>>>>>>> 5. serializable isolation is achieved by atomic swap of the >>>>>>>>>>>>>> object pointer >>>>>>>>>>>>>> >>>>>>>>>>>>>> The key parts in REST spec that are not compliant are: >>>>>>>>>>>>>> 1. metadata-location field is optional in LoadTableResponse >>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L2721-L2728> >>>>>>>>>>>>>> 2. pointer swap is not enforced in the UpdateTable >>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L658> >>>>>>>>>>>>>> operation >>>>>>>>>>>>>> >>>>>>>>>>>>>> Therefore, it opens the door for a REST service to be >>>>>>>>>>>>>> completely not dependent on a JSON metadata file, store the >>>>>>>>>>>>>> Iceberg >>>>>>>>>>>>>> table/view metadata not as a file, and achieve much better >>>>>>>>>>>>>> performance >>>>>>>>>>>>>> characteristics than other catalogs. This technically gives a >>>>>>>>>>>>>> unique >>>>>>>>>>>>>> advantage for REST catalog adopters that is not there for >>>>>>>>>>>>>> non-REST catalogs >>>>>>>>>>>>>> like HMS and Glue. >>>>>>>>>>>>>> >>>>>>>>>>>>>> *How can we fix this?* >>>>>>>>>>>>>> >>>>>>>>>>>>>> I suggest the following: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Firstly, I think it is good that we try to remove the >>>>>>>>>>>>>> requirements of JSON metadata file pointer and atomic pointer >>>>>>>>>>>>>> swap. We know >>>>>>>>>>>>>> these requirements have perf limitations based on production >>>>>>>>>>>>>> usage, >>>>>>>>>>>>>> especially when the metadata file is large. If that is the >>>>>>>>>>>>>> direction, we >>>>>>>>>>>>>> should make it official by changing the table/view spec to say >>>>>>>>>>>>>> that those >>>>>>>>>>>>>> requirements are catalog level implementation details that are >>>>>>>>>>>>>> no longer >>>>>>>>>>>>>> required. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Secondly, once we do that, we should declare REST spec as the >>>>>>>>>>>>>> official catalog spec to interact with Iceberg tables. Otherwise >>>>>>>>>>>>>> at least I >>>>>>>>>>>>>> will be very tempted to just break the atomic pointer swap >>>>>>>>>>>>>> pattern and >>>>>>>>>>>>>> store the entire metadata using the Glue Table object to achieve >>>>>>>>>>>>>> much >>>>>>>>>>>>>> better performance and also Glue native feature integrations, >>>>>>>>>>>>>> and I think >>>>>>>>>>>>>> other players will be equally motivated to do something similar. >>>>>>>>>>>>>> That will >>>>>>>>>>>>>> lead to even more chaos in the Iceberg catalog space. >>>>>>>>>>>>>> >>>>>>>>>>>>>> With REST spec as the official catalog spec, we can actually >>>>>>>>>>>>>> support non-REST catalogs by using the HTTP execution chain >>>>>>>>>>>>>> handler. Dan >>>>>>>>>>>>>> has already done a prototype here >>>>>>>>>>>>>> <https://github.com/apache/iceberg/commit/619127ff69f89e43a1edef2ea94c3dd439396a8d#diff-869264a83ba9ca657e7defefaa16ad196b0de9fce6c87f97533db77f29e44762> >>>>>>>>>>>>>> that is based on this discussion >>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/8091#issuecomment-1647189146> >>>>>>>>>>>>>> in the past about using AWS Lambda as an alternative HTTP client >>>>>>>>>>>>>> for REST >>>>>>>>>>>>>> catalog. The same approach can be used to talk to >>>>>>>>>>>>>> HMS/Glue/JDBC/... while >>>>>>>>>>>>>> users will only interact with the RESTCatalog as the entry point. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think this can provide a good path forward overall for the >>>>>>>>>>>>>> catalog consolidation story, interested to know what others >>>>>>>>>>>>>> think. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Jack Ye >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>> >>> -- >>> Ryan Blue >>> Tabular >>> >> >> >> -- >> Ryan Blue >> Tabular >> > -- Ryan Blue Tabular