Sounds like we should try to finalize a consensus around https://github.com/apache/iceberg/pull/9940, so that we make it very clear what APIs/features are optional.
-Jack On Tue, May 28, 2024 at 7:25 AM Fokko Driesprong <fo...@apache.org> wrote: > Hey Robert, > > Sorry for the late reply as I was out last week. I'm not an OAuth guru > either, but some context from my end. > > * Credentials (for example username/password) must _never_ be sent to >> the resource server, only to the authorization server. > > > In an earlier discussion <https://github.com/apache/iceberg/pull/8976>, > it was agreed that the resource server can also function as the > authorization server. But the roles can also be separate. > > 1.2. As long as OAuth2 is the only mechanism supported by the Iceberg >> client, make the existing client parameter “oauth2-server-uri” >> mandatory. The Iceberg REST catalog must fail to initialize if the >> “oauth2-server-uri” parameter is not defined. > > > It can also be that there is no authentication in the case of an internal > REST catalog. For example, the iceberg-rest-image > <https://github.com/tabular-io/iceberg-rest-image> that we use for > integration tests in PyIceberg. > > We think that Apache Iceberg REST Catalog spec should not mandate that a >> catalog implementation responds to requests to produce Auth Tokens >> (since the REST spec v1 defines a /v1/tokens endpoint, current >> implementations have to take deliberate actions when responding to those >> requests, whether with successful token responses or with “access >> denied” or “unsupported” responses). > > The `/v1/tokens` endpoint is optional > <https://github.com/apache/iceberg-python/blob/756ae625a2ea0f9c12df78430512ce991f6a1976/pyiceberg/catalog/rest.py#L488-L489> > . > > * Credentials (for example username/password) must _never_ be sent to >> the resource server, only to the authorization server. > > > I fully agree! > > Even if an Iceberg REST server does not implement the ‘/v1/oauth/tokens’ >> endpoint, it can still receive requests to ‘/v1/oauth/tokens’ containing >> clear text credentials, if clients are misconfigured (humans do make >> mistakes) - it’s a non-zero risk - bad actors can implement/intercept >> that ‘/v1/oauth/tokens’ endpoint and just wait for misconfigured >> clients to send credentials. > > > I think the wording is chosen badly. It should not send any credentials, > but the code (as in this example > <https://developers.google.com/identity/protocols/oauth2#installed> by > GCS). > > I think Jack makes a good point with AWS SigV4 Authentication. I suppose, >> in REST Catalog implementations that support that auth method, the >> /v1/oauth/token Catalog REST endpoint is redundant. >> > > There are other cloud providers next to AWS. > > Kind regards, > Fokko > > > > Op do 23 mei 2024 om 15:49 schreef Dmitri Bourlatchkov > <dmitri.bourlatch...@dremio.com.invalid>: > >> I think Jack makes a good point with AWS SigV4 Authentication. I suppose, >> in REST Catalog implementations that support that auth method, the >> /v1/oauth/token Catalog REST endpoint is redundant. >> >> Cheers, >> Dmitri. >> >> On Thu, May 23, 2024 at 9:20 AM Jack Ye <yezhao...@gmail.com> wrote: >> >>> I do not know enough details about OAuth to make comments about this >>> issue, but just regarding the statement "OAuth2 is the only mechanism >>> supported by the Iceberg client", AWS Sigv4 auth is also supported, at >>> least in the Java client implementation >>> <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/HTTPClient.java#L72>. >>> It would be nice if we formalize that in the spec, at least define it as a >>> generic authorization header. >>> >>> Best, >>> Jack Ye >>> >>> >>> >>> On Thu, May 23, 2024 at 2:51 AM Robert Stupp <sn...@snazy.de> wrote: >>> >>>> Hi all, >>>> >>>> Iceberg REST implementations, either accessible on the public internet >>>> or inside an organization, are usually being secured using appropriate >>>> authorization mechanisms. The Nessie team is looking at implementing >>>> the >>>> Iceberg REST specification and have some questions around the security >>>> endpoint(s) defined in the spec. >>>> >>>> TL;DR we have questions (potentially concerns) about having the >>>> ‘/v1/oauth/tokens’ endpoint, for the reasons explained below. We think >>>> that ‘/v1/oauth/tokens’ poses potential security and OAuth2 compliance >>>> issues, and imposes how authorization should be implemented. >>>> * As an open table format, it would be good for Iceberg to focus on the >>>> table format / catalog and not how authorization is implemented. The >>>> existence of an OAuth endpoint pushes implementations to adopt >>>> authorization using only OAuth, whereas the implementers might choose >>>> several other ways to implement authorization (e.g. SAML). In our >>>> opinion the spec should leave it open to the implementation to decide >>>> how authorization will be implemented. >>>> * The existence of that endpoint also pushes operators of Iceberg REST >>>> endpoints into the authorization service business. >>>> * Clients might expose their clear-text credentials to the wrong >>>> service, if the (correct) OAuth endpoint is not configured (humans do >>>> make mistakes). >>>> * (Naive) Iceberg REST servers may proxy requests received for >>>> ‘/v1/oauth/tokens’ - and effectively become a “man-in-the-middle”, >>>> which >>>> is not fully compliant with the OAuth 2.0 specification. >>>> >>>> Our goals with this discussion are: >>>> 1. Secure the Iceberg REST specification by preventing accidental >>>> misuse/misimplementation. >>>> 2. Prevent that Iceberg REST to get into dictating the “authorization >>>> server specifics”. >>>> 3. Enable flexibility for Iceberg REST servers to opt for other >>>> authorization mechanisms than OAuth 2.0. >>>> 4. Enable REST servers to opt for integrating with any standard OAuth2 >>>> / >>>> OIDC provider (e.g. Okta, Keycloak, Authelia). >>>> >>>> OAuth 2.0 [1] is one of the common standards accepted in the industry. >>>> It defines a secure mechanism to access resources (here: Iceberg REST >>>> endpoints). The most important aspect for OAuth 2.0 resources is that >>>> (Iceberg REST) servers do not (have to) support password >>>> authentication, >>>> especially considering the security weaknesses inherent in passwords. >>>> Compromised passwords result in compromised data protected by that >>>> password. >>>> >>>> Therefore OAuth 2.0 defines a set of strict rules. Some of these are: >>>> * Credentials (for example username/password) must _never_ be sent to >>>> the resource server, only to the authorization server. >>>> * OAuth 2.0 refresh tokens must _never_ be sent to the resource server, >>>> only to the authorization server. (“Unlike access tokens, refresh >>>> tokens >>>> are intended for use only with authorization servers and are never sent >>>> to resource servers.”, cite from section 1.5 of the OAuth RFC 6749.) >>>> * While the OAuth RFC states "The authorization server may be the same >>>> server as the resource server or a separate entity", this should not be >>>> mandated. i.e the spec should be open to supporting implementations >>>> that >>>> have the authorization server and resource server co-located as well as >>>> separate. >>>> >>>> The Iceberg PR 4771 [2] added the OpenAPI path ‘/v1/oauth/tokens’, >>>> intentionally marked to “To exchange client credentials (client ID and >>>> secret) for an access token. This uses the client credentials flow.” >>>> [3]. Technically: client ID and secret are submitted using a HTTP POST >>>> request to that Iceberg REST endpoint. >>>> >>>> Having ‘/v1/oauth/tokens’ in the Iceberg REST specification can easily >>>> be seen as a hard requirement. In order to implement this in compliance >>>> with the OAuth 2.0 spec, that ‘/v1/oauth/tokens’ MUST be the >>>> authorization server. If users do not (want to) implement an >>>> authorization server, the only way to implement this ‘/v1/oauth/tokens’ >>>> endpoint would be to proxy ‘/v1/oauth/tokens’ to the actual >>>> authorization server, which means, that this proxy technically becomes >>>> a >>>> “man in the middle” - knowing both all credentials and all involved >>>> tokens. >>>> >>>> Even if an Iceberg REST server does not implement the >>>> ‘/v1/oauth/tokens’ >>>> endpoint, it can still receive requests to ‘/v1/oauth/tokens’ >>>> containing >>>> clear text credentials, if clients are misconfigured (humans do make >>>> mistakes) - it’s a non-zero risk - bad actors can implement/intercept >>>> that ‘/v1/oauth/tokens’ endpoint and just wait for misconfigured >>>> clients to send credentials. >>>> >>>> Further usages of the REST Catalog API path ‘/v1/oauth/tokens’ are “To >>>> exchange a client token and an identity token for a more specific >>>> access >>>> token. This uses the token exchange flow.” and “To exchange an access >>>> token for one with the same claims and a refreshed expiration period >>>> This uses the token exchange flow.” Both usages should and can be >>>> implemented differently. >>>> >>>> Apache Iceberg, as a table format project, should recommend protecting >>>> sensitive information. But Iceberg should not mandate _how_ that >>>> protection is implemented - but the Iceberg REST specification does >>>> effectively mandate OAuth 2.0, because other Iceberg REST endpoints do >>>> refer/require OAuth 2.0 specifics. Users that want to use other >>>> mechanisms, because they are forced to do so by their organization, >>>> would be locked out of Iceberg REST. >>>> >>>> Apache Iceberg should not mandate OAuth 2.0 as the only option - for >>>> the >>>> sake of openness for the project and flexibility for the server >>>> implementations. >>>> >>>> We think that Apache Iceberg REST Catalog spec should not mandate that >>>> a >>>> catalog implementation responds to requests to produce Auth Tokens >>>> (since the REST spec v1 defines a /v1/tokens endpoint, current >>>> implementations have to take deliberate actions when responding to >>>> those >>>> requests, whether with successful token responses or with “access >>>> denied” or “unsupported” responses). >>>> >>>> We propose the following actions: >>>> 1. Immediate mitigation: >>>> 1.1. Remove the ‘/v1/oauth/tokens’ endpoint entirely from the Iceberg’s >>>> OpenAPI spec w/o replacement. >>>> 1.2. As long as OAuth2 is the only mechanism supported by the Iceberg >>>> client, make the existing client parameter “oauth2-server-uri” >>>> mandatory. The Iceberg REST catalog must fail to initialize if the >>>> “oauth2-server-uri” parameter is not defined. >>>> 1.3. Remove all fallbacks to the ‘/v1/oauth/tokens’ endpoint. >>>> 1.4. Forbid or discourage the communication of tokens from any Iceberg >>>> REST Catalog endpoint, both via the "token" property or with any of the >>>> "urn:ietf:params:oauth:token-type:*" properties. >>>> 2. As a follow up: We’d propose a couple of implementation fixes and >>>> changes and test improvements. >>>> 3. As a follow up: Define a discovery mechanism for both the Iceberg >>>> REST base URI and OAuth 2.0 endpoints/discovery, which allows users to >>>> use a single URI to securely access Iceberg REST endpoints. >>>> 4. As a follow up: Not new, but we also want to improve the Iceberg >>>> REST >>>> specification via the “new” REST proposal. >>>> >>>> We do not think that adding recommendations to inline-documentation is >>>> enough to fully mitigate the above concerns. >>>> >>>> >>>> References: >>>> >>>> [1] RFC 6749 - The OAuth 2.0 Authorization Framework, >>>> https://datatracker.ietf.org/doc/html/rfc6749 >>>> [2] Iceberg pull request 4771 - Core: Add OAuth2 to REST catalog spec - >>>> https://github.com/apache/iceberg/pull/4771 >>>> [3] Iceberg pull request 4843 - Spec: Add more context about OAuth2 to >>>> the REST spec - https://github.com/apache/iceberg/pull/4843 >>>> >>>> -- >>>> Robert Stupp >>>> @snazy >>>> >>>>