Excellent suggestion Ajantha, I've created a milestone: https://github.com/apache/iceberg/milestone/46
I would also refrain from removing the endpoint without having an alternative. Also, we have to go through a deprecation process since this is all implemented in Java, Python, and Rust. As already shared on the mailing list, I'm working on a PR to have > interfaces with JAXRS/Swagger annotations to generate OpenAPI > JSON/YAML with the swagger-gradle-plugin. That would be really cool! I also spent some time on this when implementing the REST catalog for PyIceberg, but ran into some limitations of Open-API in expressing some of the definitions. A lot of the context around this is captured in this issue <https://github.com/apache/iceberg/issues/6798>, maybe it helps when working on that. Kind regards, Fokko Op vr 31 mei 2024 om 08:33 schreef Ajantha Bhat <ajanthab...@gmail.com>: > > In general, we find fewer, more focused proposals allow for better > discussion and faster resolution. > > Let us have a milestone called "REST catalog v2 spec" (similar to > https://github.com/apache/iceberg/milestone/42) > and keep the multiple smaller proposals organized under that. > > - Ajantha > > On Thu, May 30, 2024 at 9:46 PM Daniel Weeks <dwe...@apache.org> wrote: > >> Thanks JB, >> >> I do feel like the discussion around OAuth2, SigV4, etc. is a big enough >> topic that we wouldn't want to bundle it with other proposed changes. I >> think the discussion around both what is included in the spec and what the >> reference implementations will be for each of these protocols will be a >> rather large topic. >> >> In general, we find fewer, more focused proposals allow for better >> discussion and faster resolution. >> >> Can you split that section out into a separate document and create an >> issue for the auth changes? >> >> Thanks, >> -Dan >> >> On Thu, May 30, 2024 at 4:55 AM Jean-Baptiste Onofré <j...@nanthrax.net> >> wrote: >> >>> Hi Jack, >>> >>> Here's my comments: >>> >>> 1. I don't think we should remove the oauth2 endpoint directly like >>> this. I would first deprecate the endpoint and plan the removal in the >>> spec v2. >>> 4. I agree, and it has to be pluggable. >>> >>> I updated the REST Spec v2 proposal including first steps on v1: >>> >>> https://docs.google.com/document/d/1JUtFpdEoa6IAKt1EzJi_re0PUbh56XnfUtRe5WAfl0s/edit?usp=sharing >>> >>> As already shared on the mailing list, I'm working on a PR to have >>> interfaces with JAXRS/Swagger annotations to generate OpenAPI >>> JSON/YAML with the swagger-gradle-plugin. >>> >>> Thanks, >>> Regards >>> JB >>> >>> On Wed, May 29, 2024 at 8:03 PM Jack Ye <yezhao...@gmail.com> wrote: >>> > >>> > Just to reiterate my points discussed in the community sync here: the >>> more I think about it the more I agree the OAuth endpoint should be removed >>> from the REST spec. Even though the endpoint is optional, and even if we do >>> not care about the security concerns, it still provides users an impression >>> that the endpoint "should" be implemented, or "is the preferred >>> authentication mechanism". And as we have found out, the server capability >>> proposal does not cover this case since this is the first endpoint to hit >>> before the GetConfig endpoint. >>> > >>> > As Ryan said, if we want to do that we need an alternative plan. I >>> don't have anything concrete, but here is my line of thought: >>> > >>> > 1. remove OAuth2 endpoint from the "REST OpenAPI spec" >>> > >>> > 2. create a client-side interface (in each language) that different >>> authentication mechanisms can be plugged in to talk to the REST catalog >>> > >>> > 3. refactor and make OAuth2 an implementation of that interface. I can >>> also help with doing the same for AWS Sigv4, and the community can further >>> support some additional ones like Kerberos, SAML, Google SSO, etc. based on >>> the individual use cases. >>> > >>> > 4. turn 2 + 3 into a "REST catalog authentication spec" that documents >>> about all the supported authentication mechanisms and their defaults. For >>> OAuth2, the default is to have the auth server at the same endpoint as the >>> resource server for backwards compatibility, but that is a configurable >>> property, and we could recommend not to do that based on security concerns. >>> > >>> > Best, >>> > Jack Ye >>> > >>> > On Wed, May 29, 2024 at 10:28 AM Steven Wu <stevenz...@gmail.com> >>> wrote: >>> >> >>> >> Wondering if the auth endpoints can be separated out to a separate >>> OpenAPI spec file. Then we still have some reference for interactions with >>> auth server and make it clear it is not required as part of the REST >>> catalog server. In most enterprise environments, auth server is likely a >>> separate server. >>> >> >>> >> On Tue, May 28, 2024 at 1:25 PM Alex Dutra >>> <alex.du...@dremio.com.invalid> wrote: >>> >>> >>> >>> Hi, >>> >>> >>> >>>> >>> >>>> On point 4, isn't that possible today, Can't that be achieved with >>> the current token exchange approach, and the internal implementation of the >>> endpoint? >>> >>> >>> >>> >>> >>> Unfortunately, no. Token exchange is not widely adopted yet: for >>> example, Keycloak has only partial support for it, and Authelia, or >>> Authentik, have no support for it at all. >>> >>> >>> >>> This, and a few other technical issues with the current internals of >>> the REST client, makes it nearly impossible to achieve a good integration >>> of Iceberg REST with the majority of popular OSS authorization servers. >>> >>> >>> >>> I am planning to start another email thread to discuss these >>> practicalities, but let's first reach consensus on the broader security >>> issues voiced here, before we tackle the details. >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Alex Dutra >>> >>> >>> >>> On Tue, May 28, 2024 at 8:41 PM Amogh Jahagirdar <am...@tabular.io> >>> wrote: >>> >>>> >>> >>>> I disagree with removing "/v1/oauth/tokens" and I think I also >>> disagree with the premise that implementing that endpoint is required, but >>> I can understand how that's not clear in the spec. I think we can address >>> the required vs non-required discussion with the capabilities PR. >>> >>>> >>> >>>> It seems like another part of what's driving this discussion is >>> some concern around how do we enforce REST catalog implementations which do >>> implement this endpoint to make sure that the implementation is secure (for >>> example to avoid the MITM example that was brought up). This is ultimately >>> a runtime detail. To me it seems like if we make it clear that such an >>> endpoint should be implemented respecting OAuth2 standards, and we know >>> that OAuth2 compliance requires avoiding that MITM situation, then runtime >>> implementations should just follow the spec there >>> >>>> >>> >>>> >3. Enable flexibility for Iceberg REST servers to opt for other >>> >>>> authorization mechanisms than OAuth 2.0. >>> >>>> >4. Enable REST servers to opt for integrating with any standard >>> OAuth2 / >>> >>>> OIDC provider (e.g. Okta, Keycloak, Authelia). >>> >>>> >>> >>>> I agree with both of these points; again I don't think the >>> intention is Oauth2 is the only way, but I think the capabilities PR will >>> make that even more clear. >>> >>>> On point 4, isn't that possible today, Can't that be achieved with >>> the current token exchange approach, and the internal implementation of the >>> endpoint? Sorry if I missed that explanation. >>> >>>> >>> >>>> Thanks, >>> >>>> >>> >>>> Amogh Jahagirdar >>> >>>> >>> >>>> On Tue, May 28, 2024 at 11:13 AM Yufei Gu <flyrain...@gmail.com> >>> wrote: >>> >>>>> >>> >>>>> Not an expert on authentication, but reading from the context, I >>> agree that it’s not a good practice to use a resource server as a token >>> server. The resource server would need to securely handle and store >>> credentials or tokens, increasing the risk of credential theft or leakage. >>> Making the token endpoint optional will mitigate the issue a bit. But if we >>> want to disable it completely, it's better to do it now to prevent any >>> issues and migration costs in the future. Can we have a consensus on it? >>> >>>>> >>> >>>>> >>> >>>>> I would prefer to deprecate it to prevent any intentional and >>> unintentional misuse. We will also need to change the clients since it >>> connects to the endpoint by default. >>> >>>>> >>> >>>>> >>> >>>>> Yufei >>> >>>>> >>> >>>>> >>> >>>>> On Tue, May 28, 2024 at 8:27 AM Jack Ye <yezhao...@gmail.com> >>> wrote: >>> >>>>>> >>> >>>>>> Sounds like we should try to finalize a consensus around >>> https://github.com/apache/iceberg/pull/9940, so that we make it very >>> clear what APIs/features are optional. >>> >>>>>> >>> >>>>>> -Jack >>> >>>>>> >>> >>>>>> On Tue, May 28, 2024 at 7:25 AM Fokko Driesprong < >>> fo...@apache.org> wrote: >>> >>>>>>> >>> >>>>>>> Hey Robert, >>> >>>>>>> >>> >>>>>>> Sorry for the late reply as I was out last week. I'm not an >>> OAuth guru either, but some context from my end. >>> >>>>>>> >>> >>>>>>>> * Credentials (for example username/password) must _never_ be >>> sent to >>> >>>>>>>> the resource server, only to the authorization server. >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> In an earlier discussion, it was agreed that the resource server >>> can also function as the authorization server. But the roles can also be >>> separate. >>> >>>>>>> >>> >>>>>>>> 1.2. As long as OAuth2 is the only mechanism supported by the >>> Iceberg >>> >>>>>>>> client, make the existing client parameter “oauth2-server-uri” >>> >>>>>>>> mandatory. The Iceberg REST catalog must fail to initialize if >>> the >>> >>>>>>>> “oauth2-server-uri” parameter is not defined. >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> It can also be that there is no authentication in the case of an >>> internal REST catalog. For example, the iceberg-rest-image that we use for >>> integration tests in PyIceberg. >>> >>>>>>> >>> >>>>>>>> We think that Apache Iceberg REST Catalog spec should not >>> mandate that a >>> >>>>>>>> catalog implementation responds to requests to produce Auth >>> Tokens >>> >>>>>>>> (since the REST spec v1 defines a /v1/tokens endpoint, current >>> >>>>>>>> implementations have to take deliberate actions when responding >>> to those >>> >>>>>>>> requests, whether with successful token responses or with >>> “access >>> >>>>>>>> denied” or “unsupported” responses). >>> >>>>>>> >>> >>>>>>> The `/v1/tokens` endpoint is optional. >>> >>>>>>> >>> >>>>>>>> * Credentials (for example username/password) must _never_ be >>> sent to >>> >>>>>>>> the resource server, only to the authorization server. >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> I fully agree! >>> >>>>>>> >>> >>>>>>>> Even if an Iceberg REST server does not implement the >>> ‘/v1/oauth/tokens’ >>> >>>>>>>> endpoint, it can still receive requests to ‘/v1/oauth/tokens’ >>> containing >>> >>>>>>>> clear text credentials, if clients are misconfigured (humans do >>> make >>> >>>>>>>> mistakes) - it’s a non-zero risk - bad actors can >>> implement/intercept >>> >>>>>>>> that ‘/v1/oauth/tokens’ endpoint and just wait for >>> misconfigured >>> >>>>>>>> clients to send credentials. >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> I think the wording is chosen badly. It should not send any >>> credentials, but the code (as in this example by GCS). >>> >>>>>>> >>> >>>>>>>> I think Jack makes a good point with AWS SigV4 Authentication. >>> I suppose, in REST Catalog implementations that support that auth method, >>> the /v1/oauth/token Catalog REST endpoint is redundant. >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> There are other cloud providers next to AWS. >>> >>>>>>> >>> >>>>>>> Kind regards, >>> >>>>>>> Fokko >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> Op do 23 mei 2024 om 15:49 schreef Dmitri Bourlatchkov >>> <dmitri.bourlatch...@dremio.com.invalid>: >>> >>>>>>>> >>> >>>>>>>> I think Jack makes a good point with AWS SigV4 Authentication. >>> I suppose, in REST Catalog implementations that support that auth method, >>> the /v1/oauth/token Catalog REST endpoint is redundant. >>> >>>>>>>> >>> >>>>>>>> Cheers, >>> >>>>>>>> Dmitri. >>> >>>>>>>> >>> >>>>>>>> On Thu, May 23, 2024 at 9:20 AM Jack Ye <yezhao...@gmail.com> >>> wrote: >>> >>>>>>>>> >>> >>>>>>>>> I do not know enough details about OAuth to make comments >>> about this issue, but just regarding the statement "OAuth2 is the only >>> mechanism supported by the Iceberg client", AWS Sigv4 auth is also >>> supported, at least in the Java client implementation. It would be nice if >>> we formalize that in the spec, at least define it as a generic >>> authorization header. >>> >>>>>>>>> >>> >>>>>>>>> Best, >>> >>>>>>>>> Jack Ye >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> On Thu, May 23, 2024 at 2:51 AM Robert Stupp <sn...@snazy.de> >>> wrote: >>> >>>>>>>>>> >>> >>>>>>>>>> Hi all, >>> >>>>>>>>>> >>> >>>>>>>>>> Iceberg REST implementations, either accessible on the public >>> internet >>> >>>>>>>>>> or inside an organization, are usually being secured using >>> appropriate >>> >>>>>>>>>> authorization mechanisms. The Nessie team is looking at >>> implementing the >>> >>>>>>>>>> Iceberg REST specification and have some questions around the >>> security >>> >>>>>>>>>> endpoint(s) defined in the spec. >>> >>>>>>>>>> >>> >>>>>>>>>> TL;DR we have questions (potentially concerns) about having >>> the >>> >>>>>>>>>> ‘/v1/oauth/tokens’ endpoint, for the reasons explained below. >>> We think >>> >>>>>>>>>> that ‘/v1/oauth/tokens’ poses potential security and OAuth2 >>> compliance >>> >>>>>>>>>> issues, and imposes how authorization should be implemented. >>> >>>>>>>>>> * As an open table format, it would be good for Iceberg to >>> focus on the >>> >>>>>>>>>> table format / catalog and not how authorization is >>> implemented. The >>> >>>>>>>>>> existence of an OAuth endpoint pushes implementations to adopt >>> >>>>>>>>>> authorization using only OAuth, whereas the implementers >>> might choose >>> >>>>>>>>>> several other ways to implement authorization (e.g. SAML). In >>> our >>> >>>>>>>>>> opinion the spec should leave it open to the implementation >>> to decide >>> >>>>>>>>>> how authorization will be implemented. >>> >>>>>>>>>> * The existence of that endpoint also pushes operators of >>> Iceberg REST >>> >>>>>>>>>> endpoints into the authorization service business. >>> >>>>>>>>>> * Clients might expose their clear-text credentials to the >>> wrong >>> >>>>>>>>>> service, if the (correct) OAuth endpoint is not configured >>> (humans do >>> >>>>>>>>>> make mistakes). >>> >>>>>>>>>> * (Naive) Iceberg REST servers may proxy requests received for >>> >>>>>>>>>> ‘/v1/oauth/tokens’ - and effectively become a >>> “man-in-the-middle”, which >>> >>>>>>>>>> is not fully compliant with the OAuth 2.0 specification. >>> >>>>>>>>>> >>> >>>>>>>>>> Our goals with this discussion are: >>> >>>>>>>>>> 1. Secure the Iceberg REST specification by preventing >>> accidental >>> >>>>>>>>>> misuse/misimplementation. >>> >>>>>>>>>> 2. Prevent that Iceberg REST to get into dictating the >>> “authorization >>> >>>>>>>>>> server specifics”. >>> >>>>>>>>>> 3. Enable flexibility for Iceberg REST servers to opt for >>> other >>> >>>>>>>>>> authorization mechanisms than OAuth 2.0. >>> >>>>>>>>>> 4. Enable REST servers to opt for integrating with any >>> standard OAuth2 / >>> >>>>>>>>>> OIDC provider (e.g. Okta, Keycloak, Authelia). >>> >>>>>>>>>> >>> >>>>>>>>>> OAuth 2.0 [1] is one of the common standards accepted in the >>> industry. >>> >>>>>>>>>> It defines a secure mechanism to access resources (here: >>> Iceberg REST >>> >>>>>>>>>> endpoints). The most important aspect for OAuth 2.0 resources >>> is that >>> >>>>>>>>>> (Iceberg REST) servers do not (have to) support password >>> authentication, >>> >>>>>>>>>> especially considering the security weaknesses inherent in >>> passwords. >>> >>>>>>>>>> Compromised passwords result in compromised data protected by >>> that password. >>> >>>>>>>>>> >>> >>>>>>>>>> Therefore OAuth 2.0 defines a set of strict rules. Some of >>> these are: >>> >>>>>>>>>> * Credentials (for example username/password) must _never_ be >>> sent to >>> >>>>>>>>>> the resource server, only to the authorization server. >>> >>>>>>>>>> * OAuth 2.0 refresh tokens must _never_ be sent to the >>> resource server, >>> >>>>>>>>>> only to the authorization server. (“Unlike access tokens, >>> refresh tokens >>> >>>>>>>>>> are intended for use only with authorization servers and are >>> never sent >>> >>>>>>>>>> to resource servers.”, cite from section 1.5 of the OAuth RFC >>> 6749.) >>> >>>>>>>>>> * While the OAuth RFC states "The authorization server may be >>> the same >>> >>>>>>>>>> server as the resource server or a separate entity", this >>> should not be >>> >>>>>>>>>> mandated. i.e the spec should be open to supporting >>> implementations that >>> >>>>>>>>>> have the authorization server and resource server co-located >>> as well as >>> >>>>>>>>>> separate. >>> >>>>>>>>>> >>> >>>>>>>>>> The Iceberg PR 4771 [2] added the OpenAPI path >>> ‘/v1/oauth/tokens’, >>> >>>>>>>>>> intentionally marked to “To exchange client credentials >>> (client ID and >>> >>>>>>>>>> secret) for an access token. This uses the client credentials >>> flow.” >>> >>>>>>>>>> [3]. Technically: client ID and secret are submitted using a >>> HTTP POST >>> >>>>>>>>>> request to that Iceberg REST endpoint. >>> >>>>>>>>>> >>> >>>>>>>>>> Having ‘/v1/oauth/tokens’ in the Iceberg REST specification >>> can easily >>> >>>>>>>>>> be seen as a hard requirement. In order to implement this in >>> compliance >>> >>>>>>>>>> with the OAuth 2.0 spec, that ‘/v1/oauth/tokens’ MUST be the >>> >>>>>>>>>> authorization server. If users do not (want to) implement an >>> >>>>>>>>>> authorization server, the only way to implement this >>> ‘/v1/oauth/tokens’ >>> >>>>>>>>>> endpoint would be to proxy ‘/v1/oauth/tokens’ to the actual >>> >>>>>>>>>> authorization server, which means, that this proxy >>> technically becomes a >>> >>>>>>>>>> “man in the middle” - knowing both all credentials and all >>> involved tokens. >>> >>>>>>>>>> >>> >>>>>>>>>> Even if an Iceberg REST server does not implement the >>> ‘/v1/oauth/tokens’ >>> >>>>>>>>>> endpoint, it can still receive requests to ‘/v1/oauth/tokens’ >>> containing >>> >>>>>>>>>> clear text credentials, if clients are misconfigured (humans >>> do make >>> >>>>>>>>>> mistakes) - it’s a non-zero risk - bad actors can >>> implement/intercept >>> >>>>>>>>>> that ‘/v1/oauth/tokens’ endpoint and just wait for >>> misconfigured >>> >>>>>>>>>> clients to send credentials. >>> >>>>>>>>>> >>> >>>>>>>>>> Further usages of the REST Catalog API path >>> ‘/v1/oauth/tokens’ are “To >>> >>>>>>>>>> exchange a client token and an identity token for a more >>> specific access >>> >>>>>>>>>> token. This uses the token exchange flow.” and “To exchange >>> an access >>> >>>>>>>>>> token for one with the same claims and a refreshed expiration >>> period >>> >>>>>>>>>> This uses the token exchange flow.” Both usages should and >>> can be >>> >>>>>>>>>> implemented differently. >>> >>>>>>>>>> >>> >>>>>>>>>> Apache Iceberg, as a table format project, should recommend >>> protecting >>> >>>>>>>>>> sensitive information. But Iceberg should not mandate _how_ >>> that >>> >>>>>>>>>> protection is implemented - but the Iceberg REST >>> specification does >>> >>>>>>>>>> effectively mandate OAuth 2.0, because other Iceberg REST >>> endpoints do >>> >>>>>>>>>> refer/require OAuth 2.0 specifics. Users that want to use >>> other >>> >>>>>>>>>> mechanisms, because they are forced to do so by their >>> organization, >>> >>>>>>>>>> would be locked out of Iceberg REST. >>> >>>>>>>>>> >>> >>>>>>>>>> Apache Iceberg should not mandate OAuth 2.0 as the only >>> option - for the >>> >>>>>>>>>> sake of openness for the project and flexibility for the >>> server >>> >>>>>>>>>> implementations. >>> >>>>>>>>>> >>> >>>>>>>>>> We think that Apache Iceberg REST Catalog spec should not >>> mandate that a >>> >>>>>>>>>> catalog implementation responds to requests to produce Auth >>> Tokens >>> >>>>>>>>>> (since the REST spec v1 defines a /v1/tokens endpoint, current >>> >>>>>>>>>> implementations have to take deliberate actions when >>> responding to those >>> >>>>>>>>>> requests, whether with successful token responses or with >>> “access >>> >>>>>>>>>> denied” or “unsupported” responses). >>> >>>>>>>>>> >>> >>>>>>>>>> We propose the following actions: >>> >>>>>>>>>> 1. Immediate mitigation: >>> >>>>>>>>>> 1.1. Remove the ‘/v1/oauth/tokens’ endpoint entirely from the >>> Iceberg’s >>> >>>>>>>>>> OpenAPI spec w/o replacement. >>> >>>>>>>>>> 1.2. As long as OAuth2 is the only mechanism supported by the >>> Iceberg >>> >>>>>>>>>> client, make the existing client parameter “oauth2-server-uri” >>> >>>>>>>>>> mandatory. The Iceberg REST catalog must fail to initialize >>> if the >>> >>>>>>>>>> “oauth2-server-uri” parameter is not defined. >>> >>>>>>>>>> 1.3. Remove all fallbacks to the ‘/v1/oauth/tokens’ endpoint. >>> >>>>>>>>>> 1.4. Forbid or discourage the communication of tokens from >>> any Iceberg >>> >>>>>>>>>> REST Catalog endpoint, both via the "token" property or with >>> any of the >>> >>>>>>>>>> "urn:ietf:params:oauth:token-type:*" properties. >>> >>>>>>>>>> 2. As a follow up: We’d propose a couple of implementation >>> fixes and >>> >>>>>>>>>> changes and test improvements. >>> >>>>>>>>>> 3. As a follow up: Define a discovery mechanism for both the >>> Iceberg >>> >>>>>>>>>> REST base URI and OAuth 2.0 endpoints/discovery, which allows >>> users to >>> >>>>>>>>>> use a single URI to securely access Iceberg REST endpoints. >>> >>>>>>>>>> 4. As a follow up: Not new, but we also want to improve the >>> Iceberg REST >>> >>>>>>>>>> specification via the “new” REST proposal. >>> >>>>>>>>>> >>> >>>>>>>>>> We do not think that adding recommendations to >>> inline-documentation is >>> >>>>>>>>>> enough to fully mitigate the above concerns. >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> References: >>> >>>>>>>>>> >>> >>>>>>>>>> [1] RFC 6749 - The OAuth 2.0 Authorization Framework, >>> >>>>>>>>>> https://datatracker.ietf.org/doc/html/rfc6749 >>> >>>>>>>>>> [2] Iceberg pull request 4771 - Core: Add OAuth2 to REST >>> catalog spec - >>> >>>>>>>>>> https://github.com/apache/iceberg/pull/4771 >>> >>>>>>>>>> [3] Iceberg pull request 4843 - Spec: Add more context about >>> OAuth2 to >>> >>>>>>>>>> the REST spec - https://github.com/apache/iceberg/pull/4843 >>> >>>>>>>>>> >>> >>>>>>>>>> -- >>> >>>>>>>>>> Robert Stupp >>> >>>>>>>>>> @snazy >>> >>>>>>>>>> >>> >>