Just to reiterate my points discussed in the community sync here:
the more I think about it the more I agree the OAuth endpoint
*should be removed from the REST spec*. Even though the endpoint is
optional, and even if we do not care about the security concerns, it
still provides users an impression that the endpoint "should" be
implemented, or "is the preferred authentication mechanism". And as
we have found out, the server capability proposal does not cover
this case since this is the first endpoint to hit before the
GetConfig endpoint.
As Ryan said, if we want to do that we need an alternative plan. I
don't have anything concrete, but here is my line of thought:
1. remove OAuth2 endpoint from the "REST OpenAPI spec"
2. create a client-side interface (in each language) that different
authentication mechanisms can be plugged in to talk to the REST catalog
3. refactor and make OAuth2 an implementation of that interface. I
can also help with doing the same for AWS Sigv4, and the community
can further support some additional ones like Kerberos, SAML, Google
SSO, etc. based on the individual use cases.
4. turn 2 + 3 into a "REST catalog authentication spec" that
documents about all the supported authentication mechanisms and
their defaults. For OAuth2, the default is to have the auth server
at the same endpoint as the resource server for backwards
compatibility, but that is a configurable property, and we could
recommend not to do that based on security concerns.
Best,
Jack Ye
On Wed, May 29, 2024 at 10:28 AM Steven Wu <stevenz...@gmail.com> wrote:
Wondering if the auth endpoints can be separated out to a
separate OpenAPI spec file. Then we still have some reference
for interactions with auth server and make it clear it is not
required as part of the REST catalog server. In most enterprise
environments, auth server is likely a separate server.
On Tue, May 28, 2024 at 1:25 PM Alex Dutra
<alex.du...@dremio.com.invalid> wrote:
Hi,
On point 4, isn't that possible today, Can't that be
achieved with the current token exchange approach, and
the internal implementation of the endpoint?
Unfortunately, no. Token exchange is not widely adopted yet:
for example, Keycloak has only partial support for it, and
Authelia, or Authentik, have no support for it at all.
This, and a few other technical issues with the current
internals of the REST client, makes it nearly impossible to
achieve a good integration of Iceberg REST with the majority
of popular OSS authorization servers.
I am planning to start another email thread to discuss these
practicalities, but let's first reach consensus on the
broader security issues voiced here, before we tackle the
details.
Thanks,
Alex Dutra
On Tue, May 28, 2024 at 8:41 PM Amogh Jahagirdar
<am...@tabular.io> wrote:
I disagree with removing "/v1/oauth/tokens" and I think
I also disagree with the premise that implementing that
endpoint is required, but I can understand how that's
not clear in the spec. I think we can address the
required vs non-required discussion with the
capabilities PR.
<https://github.com/apache/iceberg/pull/9940>
It seems like another part of what's driving this
discussion is some concern around how do we enforce REST
catalog implementations which do implement this endpoint
to make sure that the implementation is secure (for
example to avoid the MITM example that was brought up).
This is ultimately a runtime detail. To me it seems like
if we make it clear that such an endpoint should be
implemented respecting OAuth2 standards, and we know
that OAuth2 compliance requires avoiding that MITM
situation, then runtime implementations should just
follow the spec there
>3. Enable flexibility for Iceberg REST servers to opt
for other
authorization mechanisms than OAuth 2.0.
>4. Enable REST servers to opt for integrating with any
standard OAuth2 /
OIDC provider (e.g. Okta, Keycloak, Authelia).
I agree with both of these points; again I don't think
the intention is Oauth2 is the only way, but I think the
capabilities PR will make that even more clear.
On point 4, isn't that possible today, Can't that be
achieved with the current token exchange approach, and
the internal implementation of the endpoint? Sorry if I
missed that explanation.
Thanks,
Amogh Jahagirdar
On Tue, May 28, 2024 at 11:13 AM Yufei Gu
<flyrain...@gmail.com> wrote:
Not an expert on authentication, but reading from
the context, I agree that it’s not a good practice
to use a resource server as a token server. The
resource server would need to securely handle and
store credentials or tokens, increasing the risk of
credential theft or leakage. Making the token
endpoint optional will mitigate the issue a bit. But
if we want to disable it completely, it's better to
do it now to prevent any issues and migration costs
in the future. Can we have a consensus on it?
I would prefer to deprecate it to prevent any
intentional and unintentional misuse. We will also
need to change the clients since it connects to the
endpoint by default.
Yufei
On Tue, May 28, 2024 at 8:27 AM Jack Ye
<yezhao...@gmail.com> wrote:
Sounds like we should try to finalize a
consensus around
https://github.com/apache/iceberg/pull/9940, so
that we make it very clear what APIs/features
are optional.
-Jack
On Tue, May 28, 2024 at 7:25 AM Fokko Driesprong
<fo...@apache.org> wrote:
Hey Robert,
Sorry for the late reply as I was out last
week. I'm not an OAuth guru either, but some
context from my end.
* Credentials (for example
username/password) must _never_ be sent to
the resource server, only to the
authorization server.
In an earlier discussion
<https://github.com/apache/iceberg/pull/8976>,
it was agreed that the resource server can
also function as the authorization server.
But the roles can also be separate.
1.2. As long as OAuth2 is the only
mechanism supported by the Iceberg
client, make the existing client
parameter “oauth2-server-uri”
mandatory. The Iceberg REST catalog must
fail to initialize if the
“oauth2-server-uri” parameter is not
defined.
It can also be that there is no
authentication in the case of an internal
REST catalog. For example, the
iceberg-rest-image
<https://github.com/tabular-io/iceberg-rest-image>
that we use for integration tests in PyIceberg.
We think that Apache Iceberg REST
Catalog spec should not mandate that a
catalog implementation responds to
requests to produce Auth Tokens
(since the REST spec v1 defines a
/v1/tokens endpoint, current
implementations have to take deliberate
actions when responding to those
requests, whether with successful token
responses or with “access
denied” or “unsupported” responses).
The `/v1/tokens` endpoint is optional
<https://github.com/apache/iceberg-python/blob/756ae625a2ea0f9c12df78430512ce991f6a1976/pyiceberg/catalog/rest.py#L488-L489>.
* Credentials (for example
username/password) must _never_ be sent to
the resource server, only to the
authorization server.
I fully agree!
Even if an Iceberg REST server does not
implement the ‘/v1/oauth/tokens’
endpoint, it can still receive requests
to ‘/v1/oauth/tokens’ containing
clear text credentials, if clients are
misconfigured (humans do make
mistakes) - it’s a non-zero risk - bad
actors can implement/intercept
that ‘/v1/oauth/tokens’ endpoint and
just wait for misconfigured
clients to send credentials.
I think the wording is chosen badly. It
should not send any credentials, but the
code (as in this example
<https://developers.google.com/identity/protocols/oauth2#installed> by
GCS).
I think Jack makes a good point with AWS
SigV4 Authentication. I suppose, in REST
Catalog implementations that support
that auth method, the /v1/oauth/token
Catalog REST endpoint is redundant.
There are other cloud providers next to AWS.
Kind regards,
Fokko
Op do 23 mei 2024 om 15:49 schreef Dmitri
Bourlatchkov
<dmitri.bourlatch...@dremio.com.invalid>:
I think Jack makes a good point with AWS
SigV4 Authentication. I suppose, in REST
Catalog implementations that support
that auth method, the /v1/oauth/token
Catalog REST endpoint is redundant.
Cheers,
Dmitri.
On Thu, May 23, 2024 at 9:20 AM Jack Ye
<yezhao...@gmail.com> wrote:
I do not know enough details about
OAuth to make comments about this
issue, but just regarding the
statement "OAuth2 is the only
mechanism supported by the Iceberg
client", AWS Sigv4 auth is also
supported, at least in the Java
client implementation
<https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/HTTPClient.java#L72>.
It would be nice if we formalize
that in the spec, at least define it
as a generic authorization header.
Best,
Jack Ye
On Thu, May 23, 2024 at 2:51 AM
Robert Stupp <sn...@snazy.de> wrote:
Hi all,
Iceberg REST implementations,
either accessible on the public
internet
or inside an organization, are
usually being secured using
appropriate
authorization mechanisms. The
Nessie team is looking at
implementing the
Iceberg REST specification and
have some questions around the
security
endpoint(s) defined in the spec.
TL;DR we have questions
(potentially concerns) about
having the
‘/v1/oauth/tokens’ endpoint, for
the reasons explained below. We
think
that ‘/v1/oauth/tokens’ poses
potential security and OAuth2
compliance
issues, and imposes how
authorization should be implemented.
* As an open table format, it
would be good for Iceberg to
focus on the
table format / catalog and not
how authorization is
implemented. The
existence of an OAuth endpoint
pushes implementations to adopt
authorization using only OAuth,
whereas the implementers might
choose
several other ways to implement
authorization (e.g. SAML). In our
opinion the spec should leave it
open to the implementation to
decide
how authorization will be
implemented.
* The existence of that endpoint
also pushes operators of Iceberg
REST
endpoints into the authorization
service business.
* Clients might expose their
clear-text credentials to the wrong
service, if the (correct) OAuth
endpoint is not configured
(humans do
make mistakes).
* (Naive) Iceberg REST servers
may proxy requests received for
‘/v1/oauth/tokens’ - and
effectively become a
“man-in-the-middle”, which
is not fully compliant with the
OAuth 2.0 specification.
Our goals with this discussion are:
1. Secure the Iceberg REST
specification by preventing
accidental
misuse/misimplementation.
2. Prevent that Iceberg REST to
get into dictating the
“authorization
server specifics”.
3. Enable flexibility for
Iceberg REST servers to opt for
other
authorization mechanisms than
OAuth 2.0.
4. Enable REST servers to opt
for integrating with any
standard OAuth2 /
OIDC provider (e.g. Okta,
Keycloak, Authelia).
OAuth 2.0 [1] is one of the
common standards accepted in the
industry.
It defines a secure mechanism to
access resources (here: Iceberg
REST
endpoints). The most important
aspect for OAuth 2.0 resources
is that
(Iceberg REST) servers do not
(have to) support password
authentication,
especially considering the
security weaknesses inherent in
passwords.
Compromised passwords result in
compromised data protected by
that password.
Therefore OAuth 2.0 defines a
set of strict rules. Some of
these are:
* Credentials (for example
username/password) must _never_
be sent to
the resource server, only to the
authorization server.
* OAuth 2.0 refresh tokens must
_never_ be sent to the resource
server,
only to the authorization
server. (“Unlike access tokens,
refresh tokens
are intended for use only with
authorization servers and are
never sent
to resource servers.”, cite from
section 1.5 of the OAuth RFC 6749.)
* While the OAuth RFC states
"The authorization server may be
the same
server as the resource server or
a separate entity", this should
not be
mandated. i.e the spec should be
open to supporting
implementations that
have the authorization server
and resource server co-located
as well as
separate.
The Iceberg PR 4771 [2] added
the OpenAPI path
‘/v1/oauth/tokens’,
intentionally marked to “To
exchange client credentials
(client ID and
secret) for an access token.
This uses the client credentials
flow.”
[3]. Technically: client ID and
secret are submitted using a
HTTP POST
request to that Iceberg REST
endpoint.
Having ‘/v1/oauth/tokens’ in the
Iceberg REST specification can
easily
be seen as a hard requirement.
In order to implement this in
compliance
with the OAuth 2.0 spec, that
‘/v1/oauth/tokens’ MUST be the
authorization server. If users
do not (want to) implement an
authorization server, the only
way to implement this
‘/v1/oauth/tokens’
endpoint would be to proxy
‘/v1/oauth/tokens’ to the actual
authorization server, which
means, that this proxy
technically becomes a
“man in the middle” - knowing
both all credentials and all
involved tokens.
Even if an Iceberg REST server
does not implement the
‘/v1/oauth/tokens’
endpoint, it can still receive
requests to ‘/v1/oauth/tokens’
containing
clear text credentials, if
clients are misconfigured
(humans do make
mistakes) - it’s a non-zero risk
- bad actors can
implement/intercept
that ‘/v1/oauth/tokens’ endpoint
and just wait for misconfigured
clients to send credentials.
Further usages of the REST
Catalog API path
‘/v1/oauth/tokens’ are “To
exchange a client token and an
identity token for a more
specific access
token. This uses the token
exchange flow.” and “To exchange
an access
token for one with the same
claims and a refreshed
expiration period
This uses the token exchange
flow.” Both usages should and
can be
implemented differently.
Apache Iceberg, as a table
format project, should recommend
protecting
sensitive information. But
Iceberg should not mandate _how_
that
protection is implemented - but
the Iceberg REST specification does
effectively mandate OAuth 2.0,
because other Iceberg REST
endpoints do
refer/require OAuth 2.0
specifics. Users that want to
use other
mechanisms, because they are
forced to do so by their
organization,
would be locked out of Iceberg REST.
Apache Iceberg should not
mandate OAuth 2.0 as the only
option - for the
sake of openness for the project
and flexibility for the server
implementations.
We think that Apache Iceberg
REST Catalog spec should not
mandate that a
catalog implementation responds
to requests to produce Auth Tokens
(since the REST spec v1 defines
a /v1/tokens endpoint, current
implementations have to take
deliberate actions when
responding to those
requests, whether with
successful token responses or
with “access
denied” or “unsupported” responses).
We propose the following actions:
1. Immediate mitigation:
1.1. Remove the
‘/v1/oauth/tokens’ endpoint
entirely from the Iceberg’s
OpenAPI spec w/o replacement.
1.2. As long as OAuth2 is the
only mechanism supported by the
Iceberg
client, make the existing client
parameter “oauth2-server-uri”
mandatory. The Iceberg REST
catalog must fail to initialize
if the
“oauth2-server-uri” parameter is
not defined.
1.3. Remove all fallbacks to the
‘/v1/oauth/tokens’ endpoint.
1.4. Forbid or discourage the
communication of tokens from any
Iceberg
REST Catalog endpoint, both via
the "token" property or with any
of the
"urn:ietf:params:oauth:token-type:*"
properties.
2. As a follow up: We’d propose
a couple of implementation fixes
and
changes and test improvements.
3. As a follow up: Define a
discovery mechanism for both the
Iceberg
REST base URI and OAuth 2.0
endpoints/discovery, which
allows users to
use a single URI to securely
access Iceberg REST endpoints.
4. As a follow up: Not new, but
we also want to improve the
Iceberg REST
specification via the “new” REST
proposal.
We do not think that adding
recommendations to
inline-documentation is
enough to fully mitigate the
above concerns.
References:
[1] RFC 6749 - The OAuth 2.0
Authorization Framework,
https://datatracker.ietf.org/doc/html/rfc6749
[2] Iceberg pull request 4771 -
Core: Add OAuth2 to REST catalog
spec -
https://github.com/apache/iceberg/pull/4771
[3] Iceberg pull request 4843 -
Spec: Add more context about
OAuth2 to
the REST spec -
https://github.com/apache/iceberg/pull/4843
--
Robert Stupp
@snazy