Hi all,
We have two copies of the Iceberg REST specification in the Polaris code
base:
* spec/iceberg-rest-catalog-open-api.yaml [1] and
* spec/polaris-catalog-service.yaml [2] - via [5] [6] [7] [8]
'iceberg-rest-catalog-open-api.yaml' is declared to be a 1:1 copy of
Iceberg's v.1.7.1 'open-api/rest-catalog-open-api.yaml', but in fact it
has already diverged beyond just code formatting.
'polaris-catalog-service.yaml' is based on a non-documented version of
Iceberg's 'open-api/rest-catalog-open-api.yaml' and is quite different,
but has hard dependencies on 'iceberg-rest-catalog-open-api.yaml'.
It is quite unclear, especially for (new) users, how those relate to
each other.
While I think that Polaris definitely needs it's own set of APIs for
it's genuine functionality, there should be a _clear_ separation from
Iceberg's endpoints and Iceberg's types - both in the OpenAPI specs and
in the endpoint path prefixes.
There is literally no guarantee that changes to Iceberg's OpenAPI spec
will work in/via 'polaris-catalog-service.yaml'. Even the "most
innocent" and non-breaking change to Iceberg's OpenAPI spec may
fundamentally break Polaris's OpenAPI spec. This is a latent risk - and
I think it is a quite serious risk.
There are also a couple of issues that have been copied to Polaris's
polaris-catalog-service.yaml:
1. The '/v1/oauth/tokens' endpoint is already deprecated for removal
2. The path-encoding of namespaces and table-identifiers is already
known to be incompatible with the current version 6.0 of the Servlet
Specification, especially "3.5.2. URI Path Canonicalization" point 10
("Rejecting Suspicious Sequences") [4]
3. The 'endpoints' array in Iceberg's 'CatalogConfig' type isn't
portable to all use cases. (Despite that it's unnecessarily overly
verbose IMHO.)
Apache Polaris does not own Apache Iceberg's OpenAPI spec. Iceberg is
completely independent on how it shapes that spec.
Nobody knows how v1 will evolve nor how v2 of Iceberg's OpenAPI spec
will look like. It is a big mistake and serious risk to assume that
there will never be a change in Iceberg's OpenAPI spec that will not
seriously affect or even break Polaris or introduce a lot of "backwards
compatibility constructs".
Another POV is that Polaris's OpenAPI spec does not only focus on
Iceberg but maybe also other table formats. Mixing other table formats
with the Iceberg specs is at least confusing.
There's no guarantee that new endpoints/types added Iceberg's OpenAPI
spec will not conflict with Polaris's endpoints/types.
While it's faster and easier to just rely on the _current_ version of
the Iceberg OpenAPI spec, it will cause a lot of unnecessary work in the
near-ish future.
I propose to let Polaris have it's own and **completely** independent
OpenAPI spec and replace 'polaris-catalog-service.yaml' to ensure that
Polaris's OpenAPI spec can never be broken by any Iceberg OpenAPI change.
Robert
[1]
https://github.com/apache/polaris/blob/d33454874f69f952da2dfb301d0330d6e8d2296e/spec/iceberg-rest-catalog-open-api.yaml
[2]
https://github.com/apache/polaris/blob/d33454874f69f952da2dfb301d0330d6e8d2296e/spec/polaris-catalog-service.yaml
[3]
https://github.com/apache/iceberg/blob/apache-iceberg-1.7.1/open-api/rest-catalog-open-api.yaml
[4]
https://jakarta.ee/specifications/servlet/6.0/jakarta-servlet-spec-6.0#uri-path-canonicalization
[5] https://github.com/apache/polaris/pull/906
[6] https://github.com/apache/polaris/pull/936
[7] https://github.com/apache/polaris/pull/808
[8] https://github.com/apache/polaris/pull/1150
--
Robert Stupp
@snazy