> In general, we find fewer, more focused proposals allow for better discussion and faster resolution.
Let us have a milestone called "REST catalog v2 spec" (similar to https://github.com/apache/iceberg/milestone/42) and keep the multiple smaller proposals organized under that. - Ajantha On Thu, May 30, 2024 at 9:46 PM Daniel Weeks <dwe...@apache.org> wrote: > Thanks JB, > > I do feel like the discussion around OAuth2, SigV4, etc. is a big enough > topic that we wouldn't want to bundle it with other proposed changes. I > think the discussion around both what is included in the spec and what the > reference implementations will be for each of these protocols will be a > rather large topic. > > In general, we find fewer, more focused proposals allow for better > discussion and faster resolution. > > Can you split that section out into a separate document and create an > issue for the auth changes? > > Thanks, > -Dan > > On Thu, May 30, 2024 at 4:55 AM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> Hi Jack, >> >> Here's my comments: >> >> 1. I don't think we should remove the oauth2 endpoint directly like >> this. I would first deprecate the endpoint and plan the removal in the >> spec v2. >> 4. I agree, and it has to be pluggable. >> >> I updated the REST Spec v2 proposal including first steps on v1: >> >> https://docs.google.com/document/d/1JUtFpdEoa6IAKt1EzJi_re0PUbh56XnfUtRe5WAfl0s/edit?usp=sharing >> >> As already shared on the mailing list, I'm working on a PR to have >> interfaces with JAXRS/Swagger annotations to generate OpenAPI >> JSON/YAML with the swagger-gradle-plugin. >> >> Thanks, >> Regards >> JB >> >> On Wed, May 29, 2024 at 8:03 PM Jack Ye <yezhao...@gmail.com> wrote: >> > >> > Just to reiterate my points discussed in the community sync here: the >> more I think about it the more I agree the OAuth endpoint should be removed >> from the REST spec. Even though the endpoint is optional, and even if we do >> not care about the security concerns, it still provides users an impression >> that the endpoint "should" be implemented, or "is the preferred >> authentication mechanism". And as we have found out, the server capability >> proposal does not cover this case since this is the first endpoint to hit >> before the GetConfig endpoint. >> > >> > As Ryan said, if we want to do that we need an alternative plan. I >> don't have anything concrete, but here is my line of thought: >> > >> > 1. remove OAuth2 endpoint from the "REST OpenAPI spec" >> > >> > 2. create a client-side interface (in each language) that different >> authentication mechanisms can be plugged in to talk to the REST catalog >> > >> > 3. refactor and make OAuth2 an implementation of that interface. I can >> also help with doing the same for AWS Sigv4, and the community can further >> support some additional ones like Kerberos, SAML, Google SSO, etc. based on >> the individual use cases. >> > >> > 4. turn 2 + 3 into a "REST catalog authentication spec" that documents >> about all the supported authentication mechanisms and their defaults. For >> OAuth2, the default is to have the auth server at the same endpoint as the >> resource server for backwards compatibility, but that is a configurable >> property, and we could recommend not to do that based on security concerns. >> > >> > Best, >> > Jack Ye >> > >> > On Wed, May 29, 2024 at 10:28 AM Steven Wu <stevenz...@gmail.com> >> wrote: >> >> >> >> Wondering if the auth endpoints can be separated out to a separate >> OpenAPI spec file. Then we still have some reference for interactions with >> auth server and make it clear it is not required as part of the REST >> catalog server. In most enterprise environments, auth server is likely a >> separate server. >> >> >> >> On Tue, May 28, 2024 at 1:25 PM Alex Dutra >> <alex.du...@dremio.com.invalid> wrote: >> >>> >> >>> Hi, >> >>> >> >>>> >> >>>> On point 4, isn't that possible today, Can't that be achieved with >> the current token exchange approach, and the internal implementation of the >> endpoint? >> >>> >> >>> >> >>> Unfortunately, no. Token exchange is not widely adopted yet: for >> example, Keycloak has only partial support for it, and Authelia, or >> Authentik, have no support for it at all. >> >>> >> >>> This, and a few other technical issues with the current internals of >> the REST client, makes it nearly impossible to achieve a good integration >> of Iceberg REST with the majority of popular OSS authorization servers. >> >>> >> >>> I am planning to start another email thread to discuss these >> practicalities, but let's first reach consensus on the broader security >> issues voiced here, before we tackle the details. >> >>> >> >>> Thanks, >> >>> >> >>> Alex Dutra >> >>> >> >>> On Tue, May 28, 2024 at 8:41 PM Amogh Jahagirdar <am...@tabular.io> >> wrote: >> >>>> >> >>>> I disagree with removing "/v1/oauth/tokens" and I think I also >> disagree with the premise that implementing that endpoint is required, but >> I can understand how that's not clear in the spec. I think we can address >> the required vs non-required discussion with the capabilities PR. >> >>>> >> >>>> It seems like another part of what's driving this discussion is some >> concern around how do we enforce REST catalog implementations which do >> implement this endpoint to make sure that the implementation is secure (for >> example to avoid the MITM example that was brought up). This is ultimately >> a runtime detail. To me it seems like if we make it clear that such an >> endpoint should be implemented respecting OAuth2 standards, and we know >> that OAuth2 compliance requires avoiding that MITM situation, then runtime >> implementations should just follow the spec there >> >>>> >> >>>> >3. Enable flexibility for Iceberg REST servers to opt for other >> >>>> authorization mechanisms than OAuth 2.0. >> >>>> >4. Enable REST servers to opt for integrating with any standard >> OAuth2 / >> >>>> OIDC provider (e.g. Okta, Keycloak, Authelia). >> >>>> >> >>>> I agree with both of these points; again I don't think the intention >> is Oauth2 is the only way, but I think the capabilities PR will make that >> even more clear. >> >>>> On point 4, isn't that possible today, Can't that be achieved with >> the current token exchange approach, and the internal implementation of the >> endpoint? Sorry if I missed that explanation. >> >>>> >> >>>> Thanks, >> >>>> >> >>>> Amogh Jahagirdar >> >>>> >> >>>> On Tue, May 28, 2024 at 11:13 AM Yufei Gu <flyrain...@gmail.com> >> wrote: >> >>>>> >> >>>>> Not an expert on authentication, but reading from the context, I >> agree that it’s not a good practice to use a resource server as a token >> server. The resource server would need to securely handle and store >> credentials or tokens, increasing the risk of credential theft or leakage. >> Making the token endpoint optional will mitigate the issue a bit. But if we >> want to disable it completely, it's better to do it now to prevent any >> issues and migration costs in the future. Can we have a consensus on it? >> >>>>> >> >>>>> >> >>>>> I would prefer to deprecate it to prevent any intentional and >> unintentional misuse. We will also need to change the clients since it >> connects to the endpoint by default. >> >>>>> >> >>>>> >> >>>>> Yufei >> >>>>> >> >>>>> >> >>>>> On Tue, May 28, 2024 at 8:27 AM Jack Ye <yezhao...@gmail.com> >> wrote: >> >>>>>> >> >>>>>> Sounds like we should try to finalize a consensus around >> https://github.com/apache/iceberg/pull/9940, so that we make it very >> clear what APIs/features are optional. >> >>>>>> >> >>>>>> -Jack >> >>>>>> >> >>>>>> On Tue, May 28, 2024 at 7:25 AM Fokko Driesprong <fo...@apache.org> >> wrote: >> >>>>>>> >> >>>>>>> Hey Robert, >> >>>>>>> >> >>>>>>> Sorry for the late reply as I was out last week. I'm not an OAuth >> guru either, but some context from my end. >> >>>>>>> >> >>>>>>>> * Credentials (for example username/password) must _never_ be >> sent to >> >>>>>>>> the resource server, only to the authorization server. >> >>>>>>> >> >>>>>>> >> >>>>>>> In an earlier discussion, it was agreed that the resource server >> can also function as the authorization server. But the roles can also be >> separate. >> >>>>>>> >> >>>>>>>> 1.2. As long as OAuth2 is the only mechanism supported by the >> Iceberg >> >>>>>>>> client, make the existing client parameter “oauth2-server-uri” >> >>>>>>>> mandatory. The Iceberg REST catalog must fail to initialize if >> the >> >>>>>>>> “oauth2-server-uri” parameter is not defined. >> >>>>>>> >> >>>>>>> >> >>>>>>> It can also be that there is no authentication in the case of an >> internal REST catalog. For example, the iceberg-rest-image that we use for >> integration tests in PyIceberg. >> >>>>>>> >> >>>>>>>> We think that Apache Iceberg REST Catalog spec should not >> mandate that a >> >>>>>>>> catalog implementation responds to requests to produce Auth >> Tokens >> >>>>>>>> (since the REST spec v1 defines a /v1/tokens endpoint, current >> >>>>>>>> implementations have to take deliberate actions when responding >> to those >> >>>>>>>> requests, whether with successful token responses or with “access >> >>>>>>>> denied” or “unsupported” responses). >> >>>>>>> >> >>>>>>> The `/v1/tokens` endpoint is optional. >> >>>>>>> >> >>>>>>>> * Credentials (for example username/password) must _never_ be >> sent to >> >>>>>>>> the resource server, only to the authorization server. >> >>>>>>> >> >>>>>>> >> >>>>>>> I fully agree! >> >>>>>>> >> >>>>>>>> Even if an Iceberg REST server does not implement the >> ‘/v1/oauth/tokens’ >> >>>>>>>> endpoint, it can still receive requests to ‘/v1/oauth/tokens’ >> containing >> >>>>>>>> clear text credentials, if clients are misconfigured (humans do >> make >> >>>>>>>> mistakes) - it’s a non-zero risk - bad actors can >> implement/intercept >> >>>>>>>> that ‘/v1/oauth/tokens’ endpoint and just wait for misconfigured >> >>>>>>>> clients to send credentials. >> >>>>>>> >> >>>>>>> >> >>>>>>> I think the wording is chosen badly. It should not send any >> credentials, but the code (as in this example by GCS). >> >>>>>>> >> >>>>>>>> I think Jack makes a good point with AWS SigV4 Authentication. I >> suppose, in REST Catalog implementations that support that auth method, the >> /v1/oauth/token Catalog REST endpoint is redundant. >> >>>>>>> >> >>>>>>> >> >>>>>>> There are other cloud providers next to AWS. >> >>>>>>> >> >>>>>>> Kind regards, >> >>>>>>> Fokko >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> Op do 23 mei 2024 om 15:49 schreef Dmitri Bourlatchkov >> <dmitri.bourlatch...@dremio.com.invalid>: >> >>>>>>>> >> >>>>>>>> I think Jack makes a good point with AWS SigV4 Authentication. I >> suppose, in REST Catalog implementations that support that auth method, the >> /v1/oauth/token Catalog REST endpoint is redundant. >> >>>>>>>> >> >>>>>>>> Cheers, >> >>>>>>>> Dmitri. >> >>>>>>>> >> >>>>>>>> On Thu, May 23, 2024 at 9:20 AM Jack Ye <yezhao...@gmail.com> >> wrote: >> >>>>>>>>> >> >>>>>>>>> I do not know enough details about OAuth to make comments about >> this issue, but just regarding the statement "OAuth2 is the only mechanism >> supported by the Iceberg client", AWS Sigv4 auth is also supported, at >> least in the Java client implementation. It would be nice if we formalize >> that in the spec, at least define it as a generic authorization header. >> >>>>>>>>> >> >>>>>>>>> Best, >> >>>>>>>>> Jack Ye >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Thu, May 23, 2024 at 2:51 AM Robert Stupp <sn...@snazy.de> >> wrote: >> >>>>>>>>>> >> >>>>>>>>>> Hi all, >> >>>>>>>>>> >> >>>>>>>>>> Iceberg REST implementations, either accessible on the public >> internet >> >>>>>>>>>> or inside an organization, are usually being secured using >> appropriate >> >>>>>>>>>> authorization mechanisms. The Nessie team is looking at >> implementing the >> >>>>>>>>>> Iceberg REST specification and have some questions around the >> security >> >>>>>>>>>> endpoint(s) defined in the spec. >> >>>>>>>>>> >> >>>>>>>>>> TL;DR we have questions (potentially concerns) about having the >> >>>>>>>>>> ‘/v1/oauth/tokens’ endpoint, for the reasons explained below. >> We think >> >>>>>>>>>> that ‘/v1/oauth/tokens’ poses potential security and OAuth2 >> compliance >> >>>>>>>>>> issues, and imposes how authorization should be implemented. >> >>>>>>>>>> * As an open table format, it would be good for Iceberg to >> focus on the >> >>>>>>>>>> table format / catalog and not how authorization is >> implemented. The >> >>>>>>>>>> existence of an OAuth endpoint pushes implementations to adopt >> >>>>>>>>>> authorization using only OAuth, whereas the implementers might >> choose >> >>>>>>>>>> several other ways to implement authorization (e.g. SAML). In >> our >> >>>>>>>>>> opinion the spec should leave it open to the implementation to >> decide >> >>>>>>>>>> how authorization will be implemented. >> >>>>>>>>>> * The existence of that endpoint also pushes operators of >> Iceberg REST >> >>>>>>>>>> endpoints into the authorization service business. >> >>>>>>>>>> * Clients might expose their clear-text credentials to the >> wrong >> >>>>>>>>>> service, if the (correct) OAuth endpoint is not configured >> (humans do >> >>>>>>>>>> make mistakes). >> >>>>>>>>>> * (Naive) Iceberg REST servers may proxy requests received for >> >>>>>>>>>> ‘/v1/oauth/tokens’ - and effectively become a >> “man-in-the-middle”, which >> >>>>>>>>>> is not fully compliant with the OAuth 2.0 specification. >> >>>>>>>>>> >> >>>>>>>>>> Our goals with this discussion are: >> >>>>>>>>>> 1. Secure the Iceberg REST specification by preventing >> accidental >> >>>>>>>>>> misuse/misimplementation. >> >>>>>>>>>> 2. Prevent that Iceberg REST to get into dictating the >> “authorization >> >>>>>>>>>> server specifics”. >> >>>>>>>>>> 3. Enable flexibility for Iceberg REST servers to opt for other >> >>>>>>>>>> authorization mechanisms than OAuth 2.0. >> >>>>>>>>>> 4. Enable REST servers to opt for integrating with any >> standard OAuth2 / >> >>>>>>>>>> OIDC provider (e.g. Okta, Keycloak, Authelia). >> >>>>>>>>>> >> >>>>>>>>>> OAuth 2.0 [1] is one of the common standards accepted in the >> industry. >> >>>>>>>>>> It defines a secure mechanism to access resources (here: >> Iceberg REST >> >>>>>>>>>> endpoints). The most important aspect for OAuth 2.0 resources >> is that >> >>>>>>>>>> (Iceberg REST) servers do not (have to) support password >> authentication, >> >>>>>>>>>> especially considering the security weaknesses inherent in >> passwords. >> >>>>>>>>>> Compromised passwords result in compromised data protected by >> that password. >> >>>>>>>>>> >> >>>>>>>>>> Therefore OAuth 2.0 defines a set of strict rules. Some of >> these are: >> >>>>>>>>>> * Credentials (for example username/password) must _never_ be >> sent to >> >>>>>>>>>> the resource server, only to the authorization server. >> >>>>>>>>>> * OAuth 2.0 refresh tokens must _never_ be sent to the >> resource server, >> >>>>>>>>>> only to the authorization server. (“Unlike access tokens, >> refresh tokens >> >>>>>>>>>> are intended for use only with authorization servers and are >> never sent >> >>>>>>>>>> to resource servers.”, cite from section 1.5 of the OAuth RFC >> 6749.) >> >>>>>>>>>> * While the OAuth RFC states "The authorization server may be >> the same >> >>>>>>>>>> server as the resource server or a separate entity", this >> should not be >> >>>>>>>>>> mandated. i.e the spec should be open to supporting >> implementations that >> >>>>>>>>>> have the authorization server and resource server co-located >> as well as >> >>>>>>>>>> separate. >> >>>>>>>>>> >> >>>>>>>>>> The Iceberg PR 4771 [2] added the OpenAPI path >> ‘/v1/oauth/tokens’, >> >>>>>>>>>> intentionally marked to “To exchange client credentials >> (client ID and >> >>>>>>>>>> secret) for an access token. This uses the client credentials >> flow.” >> >>>>>>>>>> [3]. Technically: client ID and secret are submitted using a >> HTTP POST >> >>>>>>>>>> request to that Iceberg REST endpoint. >> >>>>>>>>>> >> >>>>>>>>>> Having ‘/v1/oauth/tokens’ in the Iceberg REST specification >> can easily >> >>>>>>>>>> be seen as a hard requirement. In order to implement this in >> compliance >> >>>>>>>>>> with the OAuth 2.0 spec, that ‘/v1/oauth/tokens’ MUST be the >> >>>>>>>>>> authorization server. If users do not (want to) implement an >> >>>>>>>>>> authorization server, the only way to implement this >> ‘/v1/oauth/tokens’ >> >>>>>>>>>> endpoint would be to proxy ‘/v1/oauth/tokens’ to the actual >> >>>>>>>>>> authorization server, which means, that this proxy technically >> becomes a >> >>>>>>>>>> “man in the middle” - knowing both all credentials and all >> involved tokens. >> >>>>>>>>>> >> >>>>>>>>>> Even if an Iceberg REST server does not implement the >> ‘/v1/oauth/tokens’ >> >>>>>>>>>> endpoint, it can still receive requests to ‘/v1/oauth/tokens’ >> containing >> >>>>>>>>>> clear text credentials, if clients are misconfigured (humans >> do make >> >>>>>>>>>> mistakes) - it’s a non-zero risk - bad actors can >> implement/intercept >> >>>>>>>>>> that ‘/v1/oauth/tokens’ endpoint and just wait for >> misconfigured >> >>>>>>>>>> clients to send credentials. >> >>>>>>>>>> >> >>>>>>>>>> Further usages of the REST Catalog API path ‘/v1/oauth/tokens’ >> are “To >> >>>>>>>>>> exchange a client token and an identity token for a more >> specific access >> >>>>>>>>>> token. This uses the token exchange flow.” and “To exchange an >> access >> >>>>>>>>>> token for one with the same claims and a refreshed expiration >> period >> >>>>>>>>>> This uses the token exchange flow.” Both usages should and can >> be >> >>>>>>>>>> implemented differently. >> >>>>>>>>>> >> >>>>>>>>>> Apache Iceberg, as a table format project, should recommend >> protecting >> >>>>>>>>>> sensitive information. But Iceberg should not mandate _how_ >> that >> >>>>>>>>>> protection is implemented - but the Iceberg REST specification >> does >> >>>>>>>>>> effectively mandate OAuth 2.0, because other Iceberg REST >> endpoints do >> >>>>>>>>>> refer/require OAuth 2.0 specifics. Users that want to use other >> >>>>>>>>>> mechanisms, because they are forced to do so by their >> organization, >> >>>>>>>>>> would be locked out of Iceberg REST. >> >>>>>>>>>> >> >>>>>>>>>> Apache Iceberg should not mandate OAuth 2.0 as the only option >> - for the >> >>>>>>>>>> sake of openness for the project and flexibility for the server >> >>>>>>>>>> implementations. >> >>>>>>>>>> >> >>>>>>>>>> We think that Apache Iceberg REST Catalog spec should not >> mandate that a >> >>>>>>>>>> catalog implementation responds to requests to produce Auth >> Tokens >> >>>>>>>>>> (since the REST spec v1 defines a /v1/tokens endpoint, current >> >>>>>>>>>> implementations have to take deliberate actions when >> responding to those >> >>>>>>>>>> requests, whether with successful token responses or with >> “access >> >>>>>>>>>> denied” or “unsupported” responses). >> >>>>>>>>>> >> >>>>>>>>>> We propose the following actions: >> >>>>>>>>>> 1. Immediate mitigation: >> >>>>>>>>>> 1.1. Remove the ‘/v1/oauth/tokens’ endpoint entirely from the >> Iceberg’s >> >>>>>>>>>> OpenAPI spec w/o replacement. >> >>>>>>>>>> 1.2. As long as OAuth2 is the only mechanism supported by the >> Iceberg >> >>>>>>>>>> client, make the existing client parameter “oauth2-server-uri” >> >>>>>>>>>> mandatory. The Iceberg REST catalog must fail to initialize if >> the >> >>>>>>>>>> “oauth2-server-uri” parameter is not defined. >> >>>>>>>>>> 1.3. Remove all fallbacks to the ‘/v1/oauth/tokens’ endpoint. >> >>>>>>>>>> 1.4. Forbid or discourage the communication of tokens from any >> Iceberg >> >>>>>>>>>> REST Catalog endpoint, both via the "token" property or with >> any of the >> >>>>>>>>>> "urn:ietf:params:oauth:token-type:*" properties. >> >>>>>>>>>> 2. As a follow up: We’d propose a couple of implementation >> fixes and >> >>>>>>>>>> changes and test improvements. >> >>>>>>>>>> 3. As a follow up: Define a discovery mechanism for both the >> Iceberg >> >>>>>>>>>> REST base URI and OAuth 2.0 endpoints/discovery, which allows >> users to >> >>>>>>>>>> use a single URI to securely access Iceberg REST endpoints. >> >>>>>>>>>> 4. As a follow up: Not new, but we also want to improve the >> Iceberg REST >> >>>>>>>>>> specification via the “new” REST proposal. >> >>>>>>>>>> >> >>>>>>>>>> We do not think that adding recommendations to >> inline-documentation is >> >>>>>>>>>> enough to fully mitigate the above concerns. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> References: >> >>>>>>>>>> >> >>>>>>>>>> [1] RFC 6749 - The OAuth 2.0 Authorization Framework, >> >>>>>>>>>> https://datatracker.ietf.org/doc/html/rfc6749 >> >>>>>>>>>> [2] Iceberg pull request 4771 - Core: Add OAuth2 to REST >> catalog spec - >> >>>>>>>>>> https://github.com/apache/iceberg/pull/4771 >> >>>>>>>>>> [3] Iceberg pull request 4843 - Spec: Add more context about >> OAuth2 to >> >>>>>>>>>> the REST spec - https://github.com/apache/iceberg/pull/4843 >> >>>>>>>>>> >> >>>>>>>>>> -- >> >>>>>>>>>> Robert Stupp >> >>>>>>>>>> @snazy >> >>>>>>>>>> >> >