Hi all,
Iceberg REST implementations, either accessible on the public internet
or inside an organization, are usually being secured using appropriate
authorization mechanisms. The Nessie team is looking at implementing the
Iceberg REST specification and have some questions around the security
endpoint(s) defined in the spec.
TL;DR we have questions (potentially concerns) about having the
‘/v1/oauth/tokens’ endpoint, for the reasons explained below. We think
that ‘/v1/oauth/tokens’ poses potential security and OAuth2 compliance
issues, and imposes how authorization should be implemented.
* As an open table format, it would be good for Iceberg to focus on the
table format / catalog and not how authorization is implemented. The
existence of an OAuth endpoint pushes implementations to adopt
authorization using only OAuth, whereas the implementers might choose
several other ways to implement authorization (e.g. SAML). In our
opinion the spec should leave it open to the implementation to decide
how authorization will be implemented.
* The existence of that endpoint also pushes operators of Iceberg REST
endpoints into the authorization service business.
* Clients might expose their clear-text credentials to the wrong
service, if the (correct) OAuth endpoint is not configured (humans do
make mistakes).
* (Naive) Iceberg REST servers may proxy requests received for
‘/v1/oauth/tokens’ - and effectively become a “man-in-the-middle”, which
is not fully compliant with the OAuth 2.0 specification.
Our goals with this discussion are:
1. Secure the Iceberg REST specification by preventing accidental
misuse/misimplementation.
2. Prevent that Iceberg REST to get into dictating the “authorization
server specifics”.
3. Enable flexibility for Iceberg REST servers to opt for other
authorization mechanisms than OAuth 2.0.
4. Enable REST servers to opt for integrating with any standard OAuth2 /
OIDC provider (e.g. Okta, Keycloak, Authelia).
OAuth 2.0 [1] is one of the common standards accepted in the industry.
It defines a secure mechanism to access resources (here: Iceberg REST
endpoints). The most important aspect for OAuth 2.0 resources is that
(Iceberg REST) servers do not (have to) support password authentication,
especially considering the security weaknesses inherent in passwords.
Compromised passwords result in compromised data protected by that password.
Therefore OAuth 2.0 defines a set of strict rules. Some of these are:
* Credentials (for example username/password) must _never_ be sent to
the resource server, only to the authorization server.
* OAuth 2.0 refresh tokens must _never_ be sent to the resource server,
only to the authorization server. (“Unlike access tokens, refresh tokens
are intended for use only with authorization servers and are never sent
to resource servers.”, cite from section 1.5 of the OAuth RFC 6749.)
* While the OAuth RFC states "The authorization server may be the same
server as the resource server or a separate entity", this should not be
mandated. i.e the spec should be open to supporting implementations that
have the authorization server and resource server co-located as well as
separate.
The Iceberg PR 4771 [2] added the OpenAPI path ‘/v1/oauth/tokens’,
intentionally marked to “To exchange client credentials (client ID and
secret) for an access token. This uses the client credentials flow.”
[3]. Technically: client ID and secret are submitted using a HTTP POST
request to that Iceberg REST endpoint.
Having ‘/v1/oauth/tokens’ in the Iceberg REST specification can easily
be seen as a hard requirement. In order to implement this in compliance
with the OAuth 2.0 spec, that ‘/v1/oauth/tokens’ MUST be the
authorization server. If users do not (want to) implement an
authorization server, the only way to implement this ‘/v1/oauth/tokens’
endpoint would be to proxy ‘/v1/oauth/tokens’ to the actual
authorization server, which means, that this proxy technically becomes a
“man in the middle” - knowing both all credentials and all involved tokens.
Even if an Iceberg REST server does not implement the ‘/v1/oauth/tokens’
endpoint, it can still receive requests to ‘/v1/oauth/tokens’ containing
clear text credentials, if clients are misconfigured (humans do make
mistakes) - it’s a non-zero risk - bad actors can implement/intercept
that ‘/v1/oauth/tokens’ endpoint and just wait for misconfigured
clients to send credentials.
Further usages of the REST Catalog API path ‘/v1/oauth/tokens’ are “To
exchange a client token and an identity token for a more specific access
token. This uses the token exchange flow.” and “To exchange an access
token for one with the same claims and a refreshed expiration period
This uses the token exchange flow.” Both usages should and can be
implemented differently.
Apache Iceberg, as a table format project, should recommend protecting
sensitive information. But Iceberg should not mandate _how_ that
protection is implemented - but the Iceberg REST specification does
effectively mandate OAuth 2.0, because other Iceberg REST endpoints do
refer/require OAuth 2.0 specifics. Users that want to use other
mechanisms, because they are forced to do so by their organization,
would be locked out of Iceberg REST.
Apache Iceberg should not mandate OAuth 2.0 as the only option - for the
sake of openness for the project and flexibility for the server
implementations.
We think that Apache Iceberg REST Catalog spec should not mandate that a
catalog implementation responds to requests to produce Auth Tokens
(since the REST spec v1 defines a /v1/tokens endpoint, current
implementations have to take deliberate actions when responding to those
requests, whether with successful token responses or with “access
denied” or “unsupported” responses).
We propose the following actions:
1. Immediate mitigation:
1.1. Remove the ‘/v1/oauth/tokens’ endpoint entirely from the Iceberg’s
OpenAPI spec w/o replacement.
1.2. As long as OAuth2 is the only mechanism supported by the Iceberg
client, make the existing client parameter “oauth2-server-uri”
mandatory. The Iceberg REST catalog must fail to initialize if the
“oauth2-server-uri” parameter is not defined.
1.3. Remove all fallbacks to the ‘/v1/oauth/tokens’ endpoint.
1.4. Forbid or discourage the communication of tokens from any Iceberg
REST Catalog endpoint, both via the "token" property or with any of the
"urn:ietf:params:oauth:token-type:*" properties.
2. As a follow up: We’d propose a couple of implementation fixes and
changes and test improvements.
3. As a follow up: Define a discovery mechanism for both the Iceberg
REST base URI and OAuth 2.0 endpoints/discovery, which allows users to
use a single URI to securely access Iceberg REST endpoints.
4. As a follow up: Not new, but we also want to improve the Iceberg REST
specification via the “new” REST proposal.
We do not think that adding recommendations to inline-documentation is
enough to fully mitigate the above concerns.
References:
[1] RFC 6749 - The OAuth 2.0 Authorization Framework,
https://datatracker.ietf.org/doc/html/rfc6749
[2] Iceberg pull request 4771 - Core: Add OAuth2 to REST catalog spec -
https://github.com/apache/iceberg/pull/4771
[3] Iceberg pull request 4843 - Spec: Add more context about OAuth2 to
the REST spec - https://github.com/apache/iceberg/pull/4843
--
Robert Stupp
@snazy