There are still open concerns about the incompatibility w/ Servlet Spec
6.0, which can be a problem. That will be changed/addressed in Iceberg's
v2 REST spec. But I do not see a good reason to _introduce_ this in
Polaris APIs, knowing that it is an issue.
The other topic is "OAuth" - there is consensus (now) in Iceberg to
conform with the OAuth specification(s), and in turn _remove_ the
existing endpoint. Same concern here: I do not see a good reason to
_introduce_ this for Polaris APIs, already knowing that there is an
issue. We, Polaris, should really not re-define any OAuth specs.
Regarding "shared types" - that's exactly the problem. I do not see a
reason to change public facing Polaris APIs just because Iceberg's APIs
changed/evolved.
Regarding "shared paths" - that's also a problem. We (Polaris) do _not_
"control" Iceberg's paths/endpoints - there is no guarantee that there
will be no conflict ever. On top, there's the path-representation issue
mentioned above, which does change the representation of your example
about "namespaces".
When Iceberg introduces the v2 spec, Polaris would have to adopt its own
spec and implementation, causing unnecessary work for the project and
for users of the Polaris APIs.
Sure, it's easier and quicker to just reuse what Iceberg _currently_
defines. But it will cause more issues in the near-ish future, which can
be avoided with a little more effort. Polaris should really have APIs
that do not copy/implement already known issues and knowingly rely on
things that will go away.
NB: Iceberg's API is not "our" (Polaris) API - it's independent, Polaris
"just" implements it.
On 21.03.25 03:43, Honah J. wrote:
Thanks for the discussion so far!
'iceberg-rest-catalog-open-api.yaml' is declared to be a 1:1 copy of
Iceberg's v.1.7.1 'open-api/rest-catalog-open-api.yaml', but in fact it
has already diverged beyond just code formatting.
The iceberg-rest-catalog-open-api.yaml file should indeed be a 1:1 copy of
the upstream Iceberg 1.7.1 spec. I performed a text comparison and
confirmed that while there are formatting differences, the actual content
remains identical. Please let me know if I miss any diff here so we can fix
Sorry - that was my fault when comparing the yaml files. I confirm that
the files are identical (except for formatting, which is not a problem
at all).
it. However, it is true that Polaris’s Iceberg Catalog spec need some
customization from the Iceberg REST Catalog (IRC) spec: We have the
notification API and we do not support scan planning APIs yet. For the
/v1/oauth/tokens, It is already marked as deprecated and we have a separate
issue to track the removal https://github.com/apache/polaris/issues/12. The
reason that we made a copy of the token endpoint definition is let Polaris
community decide when to move it.
For the two main issues Yun highlighted:
For 1): The original intention behind consolidating all APIs in the root
file (polaris-catalog-service.yaml) was to minimize duplication, especially
regarding shared definitions like "servers" and "securitySchemes". However,
I understand that relying solely on Open API tags such as "Catalog API",
"Policy API", and "Generic API" to separate endpoints might cause
confusion. Hence I am ok to follow Yun's suggestion to split these APIs
clearly into two separate root files. This refactoring is straightforward
and can be done at any point without impacting existing code. We'll just
need an additional link on the website to render and preview Polaris APIs
separately after this change.
For 2) IMHO, we should maintain consistency across our APIs to avoid user
confusion. Introducing separate definitions for basic definitions such as
namespace, pageToken, pageSize, and error could complicate the user
experience unnecessarily. These core definitions are unlikely to experience
breaking changes frequently. Even in the event they do evolve, for example
from v1 to v2, we will need to do the update in Polaris anyway to
remain *Iceberg
compatible *and we will keep supporting v1 apis for a long time to allow
enough time for users to upgrade. Hence I am +1 on share basic definitions
with Iceberg spec file.
Would love to hear what others think about this!
Best,
Honah (Jonas)
On Thu, Mar 20, 2025 at 2:00 PM yun zou <yunzou.colost...@gmail.com> wrote:
Thanks Robert for bringing this up!
I see we mentioned two problems in this thread:
1) polaris-catalog-service.yaml contains reference to all catalog APIs
including the ones for Iceberg and our Polaris specific APIs,
which could be confusing to the users or new developers.
2) Polaris specific APIs also depend on the definition of
iceberg-rest-catalog-open-api.yaml. Since those are our own APIs,
shall we completely separate the definition with Iceberg spec?
For 1, although we can say polaris-catalog-service.yaml contains
references to all catalog APIs, I can also see the point about *confusion*.
I think it makes sense for us to have different xxx-service.yaml, so that
the purpose for different services can be clearly separated. For example,
polaris-iceberg-catalog-service.yaml to contain reference to all Iceberg
APIs with prefix catalog/api/v1, and polaris-catalog-service.yaml
contains reference to all Polaris specific APIs with prefix
/catalog/api/polaris/v1. Jonas have a doc
<
https://docs.google.com/document/d/1FAUnsfozCTca8CMwq04SHES1l2NwoiiXPlsS68s7Zh0/edit?tab=t.0#heading=h.ihz4p9n6lxnk
to
talk about the restructuring
For 2, Since Polaris is Iceberg compatible, especially the concept of
namespace is reused across the whole Polaris service, I think it makes
sense for us to reuse some Iceberg definitions, including namespace,
prefix, and basic error definition. For API specific requests and
response definition, I don't think it is a good idea to reuse them.
If one day Iceberg changed the definition for things like namespace, and if
Polaris wants to upgrade and provide compatibility
with this new version, then I think we need to update this concept across
the whole service. Unless Polaris decide to introduce two
different concepts of namespace and diverge with Iceberg, which i don't
think it is the case at this moment.
Best Regards,
Yun
On Tue, Mar 18, 2025 at 7:11 AM Robert Stupp <sn...@snazy.de> wrote:
Hi all,
We have two copies of the Iceberg REST specification in the Polaris code
base:
* spec/iceberg-rest-catalog-open-api.yaml [1] and
* spec/polaris-catalog-service.yaml [2] - via [5] [6] [7] [8]
'iceberg-rest-catalog-open-api.yaml' is declared to be a 1:1 copy of
Iceberg's v.1.7.1 'open-api/rest-catalog-open-api.yaml', but in fact it
has already diverged beyond just code formatting.
'polaris-catalog-service.yaml' is based on a non-documented version of
Iceberg's 'open-api/rest-catalog-open-api.yaml' and is quite different,
but has hard dependencies on 'iceberg-rest-catalog-open-api.yaml'.
It is quite unclear, especially for (new) users, how those relate to
each other.
While I think that Polaris definitely needs it's own set of APIs for
it's genuine functionality, there should be a _clear_ separation from
Iceberg's endpoints and Iceberg's types - both in the OpenAPI specs and
in the endpoint path prefixes.
There is literally no guarantee that changes to Iceberg's OpenAPI spec
will work in/via 'polaris-catalog-service.yaml'. Even the "most
innocent" and non-breaking change to Iceberg's OpenAPI spec may
fundamentally break Polaris's OpenAPI spec. This is a latent risk - and
I think it is a quite serious risk.
There are also a couple of issues that have been copied to Polaris's
polaris-catalog-service.yaml:
1. The '/v1/oauth/tokens' endpoint is already deprecated for removal
2. The path-encoding of namespaces and table-identifiers is already
known to be incompatible with the current version 6.0 of the Servlet
Specification, especially "3.5.2. URI Path Canonicalization" point 10
("Rejecting Suspicious Sequences") [4]
3. The 'endpoints' array in Iceberg's 'CatalogConfig' type isn't
portable to all use cases. (Despite that it's unnecessarily overly
verbose IMHO.)
Apache Polaris does not own Apache Iceberg's OpenAPI spec. Iceberg is
completely independent on how it shapes that spec.
Nobody knows how v1 will evolve nor how v2 of Iceberg's OpenAPI spec
will look like. It is a big mistake and serious risk to assume that
there will never be a change in Iceberg's OpenAPI spec that will not
seriously affect or even break Polaris or introduce a lot of "backwards
compatibility constructs".
Another POV is that Polaris's OpenAPI spec does not only focus on
Iceberg but maybe also other table formats. Mixing other table formats
with the Iceberg specs is at least confusing.
There's no guarantee that new endpoints/types added Iceberg's OpenAPI
spec will not conflict with Polaris's endpoints/types.
While it's faster and easier to just rely on the _current_ version of
the Iceberg OpenAPI spec, it will cause a lot of unnecessary work in the
near-ish future.
I propose to let Polaris have it's own and **completely** independent
OpenAPI spec and replace 'polaris-catalog-service.yaml' to ensure that
Polaris's OpenAPI spec can never be broken by any Iceberg OpenAPI change.
Robert
[1]
https://github.com/apache/polaris/blob/d33454874f69f952da2dfb301d0330d6e8d2296e/spec/iceberg-rest-catalog-open-api.yaml
[2]
https://github.com/apache/polaris/blob/d33454874f69f952da2dfb301d0330d6e8d2296e/spec/polaris-catalog-service.yaml
[3]
https://github.com/apache/iceberg/blob/apache-iceberg-1.7.1/open-api/rest-catalog-open-api.yaml
[4]
https://jakarta.ee/specifications/servlet/6.0/jakarta-servlet-spec-6.0#uri-path-canonicalization
[5] https://github.com/apache/polaris/pull/906
[6] https://github.com/apache/polaris/pull/936
[7] https://github.com/apache/polaris/pull/808
[8] https://github.com/apache/polaris/pull/1150
--
Robert Stupp
@snazy
--
Robert Stupp
@snazy