Just to reiterate my points discussed in the community sync here: the
more I think about it the more I agree the OAuth endpoint *should be
removed from the REST spec*. Even though the endpoint is optional, and
even if we do not care about the security concerns, it still provides
users an impression that the endpoint "should" be implemented, or "is
the preferred authentication mechanism". And as we have found out, the
server capability proposal does not cover this case since this is the
first endpoint to hit before the GetConfig endpoint.
As Ryan said, if we want to do that we need an alternative plan. I
don't have anything concrete, but here is my line of thought:
1. remove OAuth2 endpoint from the "REST OpenAPI spec"
2. create a client-side interface (in each language) that different
authentication mechanisms can be plugged in to talk to the REST catalog
3. refactor and make OAuth2 an implementation of that interface. I can
also help with doing the same for AWS Sigv4, and the community can
further support some additional ones like Kerberos, SAML, Google SSO,
etc. based on the individual use cases.
4. turn 2 + 3 into a "REST catalog authentication spec" that documents
about all the supported authentication mechanisms and their defaults.
For OAuth2, the default is to have the auth server at the same
endpoint as the resource server for backwards compatibility, but that
is a configurable property, and we could recommend not to do that
based on security concerns.
Best,
Jack Ye
On Wed, May 29, 2024 at 10:28 AM Steven Wu <stevenz...@gmail.com> wrote:
Wondering if the auth endpoints can be separated out to a separate
OpenAPI spec file. Then we still have some reference for
interactions with auth server and make it clear it is not required
as part of the REST catalog server. In most enterprise
environments, auth server is likely a separate server.
On Tue, May 28, 2024 at 1:25 PM Alex Dutra
<alex.du...@dremio.com.invalid> wrote:
Hi,
On point 4, isn't that possible today, Can't that be
achieved with the current token exchange approach, and the
internal implementation of the endpoint?
Unfortunately, no. Token exchange is not widely adopted yet:
for example, Keycloak has only partial support for it, and
Authelia, or Authentik, have no support for it at all.
This, and a few other technical issues with the current
internals of the REST client, makes it nearly impossible to
achieve a good integration of Iceberg REST with the majority
of popular OSS authorization servers.
I am planning to start another email thread to discuss these
practicalities, but let's first reach consensus on the broader
security issues voiced here, before we tackle the details.
Thanks,
Alex Dutra
On Tue, May 28, 2024 at 8:41 PM Amogh Jahagirdar
<am...@tabular.io> wrote:
I disagree with removing "/v1/oauth/tokens" and I think I
also disagree with the premise that implementing that
endpoint is required, but I can understand how that's not
clear in the spec. I think we can address the required vs
non-required discussion with the capabilities PR.
<https://github.com/apache/iceberg/pull/9940>
It seems like another part of what's driving this
discussion is some concern around how do we enforce REST
catalog implementations which do implement this endpoint
to make sure that the implementation is secure (for
example to avoid the MITM example that was brought up).
This is ultimately a runtime detail. To me it seems like
if we make it clear that such an endpoint should be
implemented respecting OAuth2 standards, and we know that
OAuth2 compliance requires avoiding that MITM situation,
then runtime implementations should just follow the spec there
>3. Enable flexibility for Iceberg REST servers to opt for
other
authorization mechanisms than OAuth 2.0.
>4. Enable REST servers to opt for integrating with any
standard OAuth2 /
OIDC provider (e.g. Okta, Keycloak, Authelia).
I agree with both of these points; again I don't think the
intention is Oauth2 is the only way, but I think the
capabilities PR will make that even more clear.
On point 4, isn't that possible today, Can't that be
achieved with the current token exchange approach, and the
internal implementation of the endpoint? Sorry if I missed
that explanation.
Thanks,
Amogh Jahagirdar
On Tue, May 28, 2024 at 11:13 AM Yufei Gu
<flyrain...@gmail.com> wrote:
Not an expert on authentication, but reading from the
context, I agree that it’s not a good practice to use
a resource server as a token server. The resource
server would need to securely handle and store
credentials or tokens, increasing the risk of
credential theft or leakage. Making the token endpoint
optional will mitigate the issue a bit. But if we want
to disable it completely, it's better to do it now to
prevent any issues and migration costs in the future.
Can we have a consensus on it?
I would prefer to deprecate it to prevent any
intentional and unintentional misuse. We will also
need to change the clients since it connects to the
endpoint by default.
Yufei
On Tue, May 28, 2024 at 8:27 AM Jack Ye
<yezhao...@gmail.com> wrote:
Sounds like we should try to finalize a consensus
around
https://github.com/apache/iceberg/pull/9940, so
that we make it very clear what APIs/features are
optional.
-Jack
On Tue, May 28, 2024 at 7:25 AM Fokko Driesprong
<fo...@apache.org> wrote:
Hey Robert,
Sorry for the late reply as I was out last
week. I'm not an OAuth guru either, but some
context from my end.
* Credentials (for example
username/password) must _never_ be sent to
the resource server, only to the
authorization server.
In an earlier discussion
<https://github.com/apache/iceberg/pull/8976>,
it was agreed that the resource server can
also function as the authorization server. But
the roles can also be separate.
1.2. As long as OAuth2 is the only
mechanism supported by the Iceberg
client, make the existing client parameter
“oauth2-server-uri”
mandatory. The Iceberg REST catalog must
fail to initialize if the
“oauth2-server-uri” parameter is not defined.
It can also be that there is no
authentication in the case of an internal REST
catalog. For example, the iceberg-rest-image
<https://github.com/tabular-io/iceberg-rest-image>
that we use for integration tests in PyIceberg.
We think that Apache Iceberg REST Catalog
spec should not mandate that a
catalog implementation responds to
requests to produce Auth Tokens
(since the REST spec v1 defines a
/v1/tokens endpoint, current
implementations have to take deliberate
actions when responding to those
requests, whether with successful token
responses or with “access
denied” or “unsupported” responses).
The `/v1/tokens` endpoint is optional
<https://github.com/apache/iceberg-python/blob/756ae625a2ea0f9c12df78430512ce991f6a1976/pyiceberg/catalog/rest.py#L488-L489>.
* Credentials (for example
username/password) must _never_ be sent to
the resource server, only to the
authorization server.
I fully agree!
Even if an Iceberg REST server does not
implement the ‘/v1/oauth/tokens’
endpoint, it can still receive requests to
‘/v1/oauth/tokens’ containing
clear text credentials, if clients are
misconfigured (humans do make
mistakes) - it’s a non-zero risk - bad
actors can implement/intercept
that ‘/v1/oauth/tokens’ endpoint and just
wait for misconfigured
clients to send credentials.
I think the wording is chosen badly. It should
not send any credentials, but the code (as in
this example
<https://developers.google.com/identity/protocols/oauth2#installed> by
GCS).
I think Jack makes a good point with AWS
SigV4 Authentication. I suppose, in REST
Catalog implementations that support that
auth method, the /v1/oauth/token Catalog
REST endpoint is redundant.
There are other cloud providers next to AWS.
Kind regards,
Fokko
Op do 23 mei 2024 om 15:49 schreef Dmitri
Bourlatchkov
<dmitri.bourlatch...@dremio.com.invalid>:
I think Jack makes a good point with AWS
SigV4 Authentication. I suppose, in REST
Catalog implementations that support that
auth method, the /v1/oauth/token Catalog
REST endpoint is redundant.
Cheers,
Dmitri.
On Thu, May 23, 2024 at 9:20 AM Jack Ye
<yezhao...@gmail.com> wrote:
I do not know enough details about
OAuth to make comments about this
issue, but just regarding the
statement "OAuth2 is the only
mechanism supported by the Iceberg
client", AWS Sigv4 auth is also
supported, at least in the Java client
implementation
<https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/HTTPClient.java#L72>.
It would be nice if we formalize that
in the spec, at least define it as a
generic authorization header.
Best,
Jack Ye
On Thu, May 23, 2024 at 2:51 AM Robert
Stupp <sn...@snazy.de> wrote:
Hi all,
Iceberg REST implementations,
either accessible on the public
internet
or inside an organization, are
usually being secured using
appropriate
authorization mechanisms. The
Nessie team is looking at
implementing the
Iceberg REST specification and
have some questions around the
security
endpoint(s) defined in the spec.
TL;DR we have questions
(potentially concerns) about
having the
‘/v1/oauth/tokens’ endpoint, for
the reasons explained below. We think
that ‘/v1/oauth/tokens’ poses
potential security and OAuth2
compliance
issues, and imposes how
authorization should be implemented.
* As an open table format, it
would be good for Iceberg to focus
on the
table format / catalog and not how
authorization is implemented. The
existence of an OAuth endpoint
pushes implementations to adopt
authorization using only OAuth,
whereas the implementers might choose
several other ways to implement
authorization (e.g. SAML). In our
opinion the spec should leave it
open to the implementation to decide
how authorization will be implemented.
* The existence of that endpoint
also pushes operators of Iceberg REST
endpoints into the authorization
service business.
* Clients might expose their
clear-text credentials to the wrong
service, if the (correct) OAuth
endpoint is not configured (humans do
make mistakes).
* (Naive) Iceberg REST servers may
proxy requests received for
‘/v1/oauth/tokens’ - and
effectively become a
“man-in-the-middle”, which
is not fully compliant with the
OAuth 2.0 specification.
Our goals with this discussion are:
1. Secure the Iceberg REST
specification by preventing
accidental
misuse/misimplementation.
2. Prevent that Iceberg REST to
get into dictating the “authorization
server specifics”.
3. Enable flexibility for Iceberg
REST servers to opt for other
authorization mechanisms than
OAuth 2.0.
4. Enable REST servers to opt for
integrating with any standard
OAuth2 /
OIDC provider (e.g. Okta,
Keycloak, Authelia).
OAuth 2.0 [1] is one of the common
standards accepted in the industry.
It defines a secure mechanism to
access resources (here: Iceberg REST
endpoints). The most important
aspect for OAuth 2.0 resources is
that
(Iceberg REST) servers do not
(have to) support password
authentication,
especially considering the
security weaknesses inherent in
passwords.
Compromised passwords result in
compromised data protected by that
password.
Therefore OAuth 2.0 defines a set
of strict rules. Some of these are:
* Credentials (for example
username/password) must _never_ be
sent to
the resource server, only to the
authorization server.
* OAuth 2.0 refresh tokens must
_never_ be sent to the resource
server,
only to the authorization server.
(“Unlike access tokens, refresh
tokens
are intended for use only with
authorization servers and are
never sent
to resource servers.”, cite from
section 1.5 of the OAuth RFC 6749.)
* While the OAuth RFC states "The
authorization server may be the same
server as the resource server or a
separate entity", this should not be
mandated. i.e the spec should be
open to supporting implementations
that
have the authorization server and
resource server co-located as well as
separate.
The Iceberg PR 4771 [2] added the
OpenAPI path ‘/v1/oauth/tokens’,
intentionally marked to “To
exchange client credentials
(client ID and
secret) for an access token. This
uses the client credentials flow.”
[3]. Technically: client ID and
secret are submitted using a HTTP
POST
request to that Iceberg REST endpoint.
Having ‘/v1/oauth/tokens’ in the
Iceberg REST specification can easily
be seen as a hard requirement. In
order to implement this in compliance
with the OAuth 2.0 spec, that
‘/v1/oauth/tokens’ MUST be the
authorization server. If users do
not (want to) implement an
authorization server, the only way
to implement this ‘/v1/oauth/tokens’
endpoint would be to proxy
‘/v1/oauth/tokens’ to the actual
authorization server, which means,
that this proxy technically becomes a
“man in the middle” - knowing both
all credentials and all involved
tokens.
Even if an Iceberg REST server
does not implement the
‘/v1/oauth/tokens’
endpoint, it can still receive
requests to ‘/v1/oauth/tokens’
containing
clear text credentials, if clients
are misconfigured (humans do make
mistakes) - it’s a non-zero risk -
bad actors can implement/intercept
that ‘/v1/oauth/tokens’ endpoint
and just wait for misconfigured
clients to send credentials.
Further usages of the REST Catalog
API path ‘/v1/oauth/tokens’ are “To
exchange a client token and an
identity token for a more specific
access
token. This uses the token
exchange flow.” and “To exchange
an access
token for one with the same claims
and a refreshed expiration period
This uses the token exchange
flow.” Both usages should and can be
implemented differently.
Apache Iceberg, as a table format
project, should recommend protecting
sensitive information. But Iceberg
should not mandate _how_ that
protection is implemented - but
the Iceberg REST specification does
effectively mandate OAuth 2.0,
because other Iceberg REST
endpoints do
refer/require OAuth 2.0 specifics.
Users that want to use other
mechanisms, because they are
forced to do so by their
organization,
would be locked out of Iceberg REST.
Apache Iceberg should not mandate
OAuth 2.0 as the only option - for
the
sake of openness for the project
and flexibility for the server
implementations.
We think that Apache Iceberg REST
Catalog spec should not mandate
that a
catalog implementation responds to
requests to produce Auth Tokens
(since the REST spec v1 defines a
/v1/tokens endpoint, current
implementations have to take
deliberate actions when responding
to those
requests, whether with successful
token responses or with “access
denied” or “unsupported” responses).
We propose the following actions:
1. Immediate mitigation:
1.1. Remove the ‘/v1/oauth/tokens’
endpoint entirely from the Iceberg’s
OpenAPI spec w/o replacement.
1.2. As long as OAuth2 is the only
mechanism supported by the Iceberg
client, make the existing client
parameter “oauth2-server-uri”
mandatory. The Iceberg REST
catalog must fail to initialize if
the
“oauth2-server-uri” parameter is
not defined.
1.3. Remove all fallbacks to the
‘/v1/oauth/tokens’ endpoint.
1.4. Forbid or discourage the
communication of tokens from any
Iceberg
REST Catalog endpoint, both via
the "token" property or with any
of the
"urn:ietf:params:oauth:token-type:*"
properties.
2. As a follow up: We’d propose a
couple of implementation fixes and
changes and test improvements.
3. As a follow up: Define a
discovery mechanism for both the
Iceberg
REST base URI and OAuth 2.0
endpoints/discovery, which allows
users to
use a single URI to securely
access Iceberg REST endpoints.
4. As a follow up: Not new, but we
also want to improve the Iceberg REST
specification via the “new” REST
proposal.
We do not think that adding
recommendations to
inline-documentation is
enough to fully mitigate the above
concerns.
References:
[1] RFC 6749 - The OAuth 2.0
Authorization Framework,
https://datatracker.ietf.org/doc/html/rfc6749
[2] Iceberg pull request 4771 -
Core: Add OAuth2 to REST catalog
spec -
https://github.com/apache/iceberg/pull/4771
[3] Iceberg pull request 4843 -
Spec: Add more context about
OAuth2 to
the REST spec -
https://github.com/apache/iceberg/pull/4843
--
Robert Stupp
@snazy