Sounds like we should try to finalize a consensus around
https://github.com/apache/iceberg/pull/9940, so that we make it very clear
what APIs/features are optional.

-Jack

On Tue, May 28, 2024 at 7:25 AM Fokko Driesprong <fo...@apache.org> wrote:

> Hey Robert,
>
> Sorry for the late reply as I was out last week. I'm not an OAuth guru
> either, but some context from my end.
>
> * Credentials (for example username/password) must _never_ be sent to
>> the resource server, only to the authorization server.
>
>
> In an earlier discussion <https://github.com/apache/iceberg/pull/8976>,
> it was agreed that the resource server can also function as the
> authorization server. But the roles can also be separate.
>
> 1.2. As long as OAuth2 is the only mechanism supported by the Iceberg
>> client, make the existing client parameter “oauth2-server-uri”
>> mandatory. The Iceberg REST catalog must fail to initialize if the
>> “oauth2-server-uri” parameter is not defined.
>
>
> It can also be that there is no authentication in the case of an internal
> REST catalog. For example, the iceberg-rest-image
> <https://github.com/tabular-io/iceberg-rest-image> that we use for
> integration tests in PyIceberg.
>
> We think that Apache Iceberg REST Catalog spec should not mandate that a
>> catalog implementation responds to requests to produce Auth Tokens
>> (since the REST spec v1 defines a /v1/tokens endpoint, current
>> implementations have to take deliberate actions when responding to those
>> requests, whether with successful token responses or with “access
>> denied” or “unsupported” responses).
>
> The `/v1/tokens` endpoint is optional
> <https://github.com/apache/iceberg-python/blob/756ae625a2ea0f9c12df78430512ce991f6a1976/pyiceberg/catalog/rest.py#L488-L489>
> .
>
> * Credentials (for example username/password) must _never_ be sent to
>> the resource server, only to the authorization server.
>
>
> I fully agree!
>
> Even if an Iceberg REST server does not implement the ‘/v1/oauth/tokens’
>> endpoint, it can still receive requests to ‘/v1/oauth/tokens’ containing
>> clear text credentials, if clients are misconfigured (humans do make
>> mistakes) - it’s a non-zero risk - bad actors can implement/intercept
>> that  ‘/v1/oauth/tokens’ endpoint and just wait for misconfigured
>> clients to send credentials.
>
>
> I think the wording is chosen badly. It should not send any credentials,
> but the code (as in this example
> <https://developers.google.com/identity/protocols/oauth2#installed> by
> GCS).
>
> I think Jack makes a good point with AWS SigV4 Authentication. I suppose,
>> in REST Catalog implementations that support that auth method, the
>> /v1/oauth/token Catalog REST endpoint is redundant.
>>
>
> There are other cloud providers next to AWS.
>
> Kind regards,
> Fokko
>
>
>
> Op do 23 mei 2024 om 15:49 schreef Dmitri Bourlatchkov
> <dmitri.bourlatch...@dremio.com.invalid>:
>
>> I think Jack makes a good point with AWS SigV4 Authentication. I suppose,
>> in REST Catalog implementations that support that auth method, the
>> /v1/oauth/token Catalog REST endpoint is redundant.
>>
>> Cheers,
>> Dmitri.
>>
>> On Thu, May 23, 2024 at 9:20 AM Jack Ye <yezhao...@gmail.com> wrote:
>>
>>> I do not know enough details about OAuth to make comments about this
>>> issue, but just regarding the statement "OAuth2 is the only mechanism
>>> supported by the Iceberg client", AWS Sigv4 auth is also supported, at
>>> least in the Java client implementation
>>> <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/HTTPClient.java#L72>.
>>> It would be nice if we formalize that in the spec, at least define it as a
>>> generic authorization header.
>>>
>>> Best,
>>> Jack Ye
>>>
>>>
>>>
>>> On Thu, May 23, 2024 at 2:51 AM Robert Stupp <sn...@snazy.de> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Iceberg REST implementations, either accessible on the public internet
>>>> or inside an organization, are usually being secured using appropriate
>>>> authorization mechanisms. The Nessie team is looking at implementing
>>>> the
>>>> Iceberg REST specification and have some questions around the security
>>>> endpoint(s) defined in the spec.
>>>>
>>>> TL;DR we have questions (potentially concerns) about having the
>>>> ‘/v1/oauth/tokens’ endpoint, for the reasons explained below. We think
>>>> that ‘/v1/oauth/tokens’ poses potential security and OAuth2 compliance
>>>> issues, and imposes how authorization should be implemented.
>>>> * As an open table format, it would be good for Iceberg to focus on the
>>>> table format / catalog and not how authorization is implemented. The
>>>> existence of an OAuth endpoint pushes implementations to adopt
>>>> authorization using only OAuth, whereas the implementers might choose
>>>> several other ways to implement authorization (e.g. SAML). In our
>>>> opinion the spec should leave it open to the implementation to decide
>>>> how authorization will be implemented.
>>>> * The existence of that endpoint also pushes operators of Iceberg REST
>>>> endpoints into the authorization service business.
>>>> * Clients might expose their clear-text credentials to the wrong
>>>> service, if the (correct) OAuth endpoint is not configured (humans do
>>>> make mistakes).
>>>> * (Naive) Iceberg REST servers may proxy requests received for
>>>> ‘/v1/oauth/tokens’ - and effectively become a “man-in-the-middle”,
>>>> which
>>>> is not fully compliant with the OAuth 2.0 specification.
>>>>
>>>> Our goals with this discussion are:
>>>> 1. Secure the Iceberg REST specification by preventing accidental
>>>> misuse/misimplementation.
>>>> 2. Prevent that Iceberg REST to get into dictating the “authorization
>>>> server specifics”.
>>>> 3. Enable flexibility for Iceberg REST servers to opt for other
>>>> authorization mechanisms than OAuth 2.0.
>>>> 4. Enable REST servers to opt for integrating with any standard OAuth2
>>>> /
>>>> OIDC provider (e.g. Okta, Keycloak, Authelia).
>>>>
>>>> OAuth 2.0 [1] is one of the common standards accepted in the industry.
>>>> It defines a secure mechanism to access resources (here: Iceberg REST
>>>> endpoints). The most important aspect for OAuth 2.0 resources is that
>>>> (Iceberg REST) servers do not (have to) support password
>>>> authentication,
>>>> especially considering the security weaknesses inherent in passwords.
>>>> Compromised passwords result in compromised data protected by that
>>>> password.
>>>>
>>>> Therefore OAuth 2.0 defines a set of strict rules. Some of these are:
>>>> * Credentials (for example username/password) must _never_ be sent to
>>>> the resource server, only to the authorization server.
>>>> * OAuth 2.0 refresh tokens must _never_ be sent to the resource server,
>>>> only to the authorization server. (“Unlike access tokens, refresh
>>>> tokens
>>>> are intended for use only with authorization servers and are never sent
>>>> to resource servers.”, cite from section 1.5 of the OAuth RFC 6749.)
>>>> * While the OAuth RFC states "The authorization server may be the same
>>>> server as the resource server or a separate entity", this should not be
>>>> mandated. i.e the spec should be open to supporting implementations
>>>> that
>>>> have the authorization server and resource server co-located as well as
>>>> separate.
>>>>
>>>> The Iceberg PR 4771 [2] added the OpenAPI path ‘/v1/oauth/tokens’,
>>>> intentionally marked to “To exchange client credentials (client ID and
>>>> secret) for an access token. This uses the client credentials flow.”
>>>> [3]. Technically: client ID and secret are submitted using a HTTP POST
>>>> request to that Iceberg REST endpoint.
>>>>
>>>> Having ‘/v1/oauth/tokens’ in the Iceberg REST specification can easily
>>>> be seen as a hard requirement. In order to implement this in compliance
>>>> with the OAuth 2.0 spec, that ‘/v1/oauth/tokens’ MUST be the
>>>> authorization server. If users do not (want to) implement an
>>>> authorization server, the only way to implement this ‘/v1/oauth/tokens’
>>>> endpoint would be to proxy ‘/v1/oauth/tokens’ to the actual
>>>> authorization server, which means, that this proxy technically becomes
>>>> a
>>>> “man in the middle” - knowing both all credentials and all involved
>>>> tokens.
>>>>
>>>> Even if an Iceberg REST server does not implement the
>>>> ‘/v1/oauth/tokens’
>>>> endpoint, it can still receive requests to ‘/v1/oauth/tokens’
>>>> containing
>>>> clear text credentials, if clients are misconfigured (humans do make
>>>> mistakes) - it’s a non-zero risk - bad actors can implement/intercept
>>>> that  ‘/v1/oauth/tokens’ endpoint and just wait for misconfigured
>>>> clients to send credentials.
>>>>
>>>> Further usages of the REST Catalog API path ‘/v1/oauth/tokens’ are “To
>>>> exchange a client token and an identity token for a more specific
>>>> access
>>>> token. This uses the token exchange flow.” and “To exchange an access
>>>> token for one with the same claims and a refreshed expiration period
>>>> This uses the token exchange flow.” Both usages should and can be
>>>> implemented differently.
>>>>
>>>> Apache Iceberg, as a table format project, should recommend protecting
>>>> sensitive information. But Iceberg should not mandate _how_ that
>>>> protection is implemented - but the Iceberg REST specification does
>>>> effectively mandate OAuth 2.0, because other Iceberg REST endpoints do
>>>> refer/require OAuth 2.0 specifics. Users that want to use other
>>>> mechanisms, because they are forced to do so by their organization,
>>>> would be locked out of Iceberg REST.
>>>>
>>>> Apache Iceberg should not mandate OAuth 2.0 as the only option - for
>>>> the
>>>> sake of openness for the project and flexibility for the server
>>>> implementations.
>>>>
>>>> We think that Apache Iceberg REST Catalog spec should not mandate that
>>>> a
>>>> catalog implementation responds to requests to produce Auth Tokens
>>>> (since the REST spec v1 defines a /v1/tokens endpoint, current
>>>> implementations have to take deliberate actions when responding to
>>>> those
>>>> requests, whether with successful token responses or with “access
>>>> denied” or “unsupported” responses).
>>>>
>>>> We propose the following actions:
>>>> 1. Immediate mitigation:
>>>> 1.1. Remove the ‘/v1/oauth/tokens’ endpoint entirely from the Iceberg’s
>>>> OpenAPI spec w/o replacement.
>>>> 1.2. As long as OAuth2 is the only mechanism supported by the Iceberg
>>>> client, make the existing client parameter “oauth2-server-uri”
>>>> mandatory. The Iceberg REST catalog must fail to initialize if the
>>>> “oauth2-server-uri” parameter is not defined.
>>>> 1.3. Remove all fallbacks to the ‘/v1/oauth/tokens’ endpoint.
>>>> 1.4. Forbid or discourage the communication of tokens from any Iceberg
>>>> REST Catalog endpoint, both via the "token" property or with any of the
>>>> "urn:ietf:params:oauth:token-type:*" properties.
>>>> 2. As a follow up: We’d propose a couple of implementation fixes and
>>>> changes and test improvements.
>>>> 3. As a follow up: Define a discovery mechanism for both the Iceberg
>>>> REST base URI and OAuth 2.0 endpoints/discovery, which allows users to
>>>> use a single URI to securely access Iceberg REST endpoints.
>>>> 4. As a follow up: Not new, but we also want to improve the Iceberg
>>>> REST
>>>> specification via the “new” REST proposal.
>>>>
>>>> We do not think that adding recommendations to inline-documentation is
>>>> enough to fully mitigate the above concerns.
>>>>
>>>>
>>>> References:
>>>>
>>>> [1] RFC 6749 - The OAuth 2.0 Authorization Framework,
>>>> https://datatracker.ietf.org/doc/html/rfc6749
>>>> [2] Iceberg pull request 4771 - Core: Add OAuth2 to REST catalog spec -
>>>> https://github.com/apache/iceberg/pull/4771
>>>> [3] Iceberg pull request 4843 - Spec: Add more context about OAuth2 to
>>>> the REST spec - https://github.com/apache/iceberg/pull/4843
>>>>
>>>> --
>>>> Robert Stupp
>>>> @snazy
>>>>
>>>>

Reply via email to