> In general, we find fewer, more focused proposals allow for better
discussion and faster resolution.

Let us have a milestone called "REST catalog v2 spec" (similar to
https://github.com/apache/iceberg/milestone/42)
and keep the multiple smaller proposals organized under that.

- Ajantha

On Thu, May 30, 2024 at 9:46 PM Daniel Weeks <dwe...@apache.org> wrote:

> Thanks JB,
>
> I do feel like the discussion around OAuth2, SigV4, etc. is a big enough
> topic that we wouldn't want to bundle it with other proposed changes.  I
> think the discussion around both what is included in the spec and what the
> reference implementations will be for each of these protocols will be a
> rather large topic.
>
> In general, we find fewer, more focused proposals allow for better
> discussion and faster resolution.
>
> Can you split that section out into a separate document and create an
> issue for the auth changes?
>
> Thanks,
> -Dan
>
> On Thu, May 30, 2024 at 4:55 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
>> Hi Jack,
>>
>> Here's my comments:
>>
>> 1. I don't think we should remove the oauth2 endpoint directly like
>> this. I would first deprecate the endpoint and plan the removal in the
>> spec v2.
>> 4. I agree, and it has to be pluggable.
>>
>> I updated the REST Spec v2 proposal including first steps on v1:
>>
>> https://docs.google.com/document/d/1JUtFpdEoa6IAKt1EzJi_re0PUbh56XnfUtRe5WAfl0s/edit?usp=sharing
>>
>> As already shared on the mailing list, I'm working on a PR to have
>> interfaces with JAXRS/Swagger annotations to generate OpenAPI
>> JSON/YAML with the swagger-gradle-plugin.
>>
>> Thanks,
>> Regards
>> JB
>>
>> On Wed, May 29, 2024 at 8:03 PM Jack Ye <yezhao...@gmail.com> wrote:
>> >
>> > Just to reiterate my points discussed in the community sync here: the
>> more I think about it the more I agree the OAuth endpoint should be removed
>> from the REST spec. Even though the endpoint is optional, and even if we do
>> not care about the security concerns, it still provides users an impression
>> that the endpoint "should" be implemented, or "is the preferred
>> authentication mechanism". And as we have found out, the server capability
>> proposal does not cover this case since this is the first endpoint to hit
>> before the GetConfig endpoint.
>> >
>> > As Ryan said, if we want to do that we need an alternative plan. I
>> don't have anything concrete, but here is my line of thought:
>> >
>> > 1. remove OAuth2 endpoint from the "REST OpenAPI spec"
>> >
>> > 2. create a client-side interface (in each language) that different
>> authentication mechanisms can be plugged in to talk to the REST catalog
>> >
>> > 3. refactor and make OAuth2 an implementation of that interface. I can
>> also help with doing the same for AWS Sigv4, and the community can further
>> support some additional ones like Kerberos, SAML, Google SSO, etc. based on
>> the individual use cases.
>> >
>> > 4. turn 2 + 3 into a "REST catalog authentication spec" that documents
>> about all the supported authentication mechanisms and their defaults. For
>> OAuth2, the default is to have the auth server at the same endpoint as the
>> resource server for backwards compatibility, but that is a configurable
>> property, and we could recommend not to do that based on security concerns.
>> >
>> > Best,
>> > Jack Ye
>> >
>> > On Wed, May 29, 2024 at 10:28 AM Steven Wu <stevenz...@gmail.com>
>> wrote:
>> >>
>> >> Wondering if the auth endpoints can be separated out to a separate
>> OpenAPI spec file. Then we still have some reference for interactions with
>> auth server and make it clear it is not required as part of the REST
>> catalog server. In most enterprise environments, auth server is likely a
>> separate server.
>> >>
>> >> On Tue, May 28, 2024 at 1:25 PM Alex Dutra
>> <alex.du...@dremio.com.invalid> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>>>
>> >>>> On point 4, isn't that possible today, Can't that be achieved with
>> the current token exchange approach, and the internal implementation of the
>> endpoint?
>> >>>
>> >>>
>> >>> Unfortunately, no. Token exchange is not widely adopted yet: for
>> example, Keycloak has only partial support for it, and Authelia, or
>> Authentik, have no support for it at all.
>> >>>
>> >>> This, and a few other technical issues with the current internals of
>> the REST client, makes it nearly impossible to achieve a good integration
>> of Iceberg REST with the majority of popular OSS authorization servers.
>> >>>
>> >>> I am planning to start another email thread to discuss these
>> practicalities, but let's first reach consensus on the broader security
>> issues voiced here, before we tackle the details.
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Alex Dutra
>> >>>
>> >>> On Tue, May 28, 2024 at 8:41 PM Amogh Jahagirdar <am...@tabular.io>
>> wrote:
>> >>>>
>> >>>> I disagree with removing "/v1/oauth/tokens" and I think I also
>> disagree with the premise that implementing that endpoint is required, but
>> I can understand how that's not clear in the spec. I think we can address
>> the required vs non-required discussion with the capabilities PR.
>> >>>>
>> >>>> It seems like another part of what's driving this discussion is some
>> concern around how do we enforce REST catalog implementations which do
>> implement this endpoint to make sure that the implementation is secure (for
>> example to avoid the MITM example that was brought up). This is ultimately
>> a runtime detail. To me it seems like if we make it clear that such an
>> endpoint should be implemented respecting OAuth2 standards, and we know
>> that OAuth2 compliance requires avoiding that MITM situation, then runtime
>> implementations should just follow the spec there
>> >>>>
>> >>>> >3. Enable flexibility for Iceberg REST servers to opt for other
>> >>>> authorization mechanisms than OAuth 2.0.
>> >>>> >4. Enable REST servers to opt for integrating with any standard
>> OAuth2 /
>> >>>> OIDC provider (e.g. Okta, Keycloak, Authelia).
>> >>>>
>> >>>> I agree with both of these points; again I don't think the intention
>> is Oauth2 is the only way, but I think the capabilities PR will make that
>> even more clear.
>> >>>> On point 4, isn't that possible today, Can't that be achieved with
>> the current token exchange approach, and the internal implementation of the
>> endpoint? Sorry if I missed that explanation.
>> >>>>
>> >>>> Thanks,
>> >>>>
>> >>>> Amogh Jahagirdar
>> >>>>
>> >>>> On Tue, May 28, 2024 at 11:13 AM Yufei Gu <flyrain...@gmail.com>
>> wrote:
>> >>>>>
>> >>>>> Not an expert on authentication, but reading from the context, I
>> agree that it’s not a good practice to use a resource server as a token
>> server. The resource server would need to securely handle and store
>> credentials or tokens, increasing the risk of credential theft or leakage.
>> Making the token endpoint optional will mitigate the issue a bit. But if we
>> want to disable it completely, it's better to do it now to prevent any
>> issues and migration costs in the future. Can we have a consensus on it?
>> >>>>>
>> >>>>>
>> >>>>> I would prefer to deprecate it to prevent any intentional and
>> unintentional misuse. We will also need to change the clients since it
>> connects to the endpoint by default.
>> >>>>>
>> >>>>>
>> >>>>> Yufei
>> >>>>>
>> >>>>>
>> >>>>> On Tue, May 28, 2024 at 8:27 AM Jack Ye <yezhao...@gmail.com>
>> wrote:
>> >>>>>>
>> >>>>>> Sounds like we should try to finalize a consensus around
>> https://github.com/apache/iceberg/pull/9940, so that we make it very
>> clear what APIs/features are optional.
>> >>>>>>
>> >>>>>> -Jack
>> >>>>>>
>> >>>>>> On Tue, May 28, 2024 at 7:25 AM Fokko Driesprong <fo...@apache.org>
>> wrote:
>> >>>>>>>
>> >>>>>>> Hey Robert,
>> >>>>>>>
>> >>>>>>> Sorry for the late reply as I was out last week. I'm not an OAuth
>> guru either, but some context from my end.
>> >>>>>>>
>> >>>>>>>> * Credentials (for example username/password) must _never_ be
>> sent to
>> >>>>>>>> the resource server, only to the authorization server.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> In an earlier discussion, it was agreed that the resource server
>> can also function as the authorization server. But the roles can also be
>> separate.
>> >>>>>>>
>> >>>>>>>> 1.2. As long as OAuth2 is the only mechanism supported by the
>> Iceberg
>> >>>>>>>> client, make the existing client parameter “oauth2-server-uri”
>> >>>>>>>> mandatory. The Iceberg REST catalog must fail to initialize if
>> the
>> >>>>>>>> “oauth2-server-uri” parameter is not defined.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> It can also be that there is no authentication in the case of an
>> internal REST catalog. For example, the iceberg-rest-image that we use for
>> integration tests in PyIceberg.
>> >>>>>>>
>> >>>>>>>> We think that Apache Iceberg REST Catalog spec should not
>> mandate that a
>> >>>>>>>> catalog implementation responds to requests to produce Auth
>> Tokens
>> >>>>>>>> (since the REST spec v1 defines a /v1/tokens endpoint, current
>> >>>>>>>> implementations have to take deliberate actions when responding
>> to those
>> >>>>>>>> requests, whether with successful token responses or with “access
>> >>>>>>>> denied” or “unsupported” responses).
>> >>>>>>>
>> >>>>>>> The `/v1/tokens` endpoint is optional.
>> >>>>>>>
>> >>>>>>>> * Credentials (for example username/password) must _never_ be
>> sent to
>> >>>>>>>> the resource server, only to the authorization server.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> I fully agree!
>> >>>>>>>
>> >>>>>>>> Even if an Iceberg REST server does not implement the
>> ‘/v1/oauth/tokens’
>> >>>>>>>> endpoint, it can still receive requests to ‘/v1/oauth/tokens’
>> containing
>> >>>>>>>> clear text credentials, if clients are misconfigured (humans do
>> make
>> >>>>>>>> mistakes) - it’s a non-zero risk - bad actors can
>> implement/intercept
>> >>>>>>>> that  ‘/v1/oauth/tokens’ endpoint and just wait for misconfigured
>> >>>>>>>> clients to send credentials.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> I think the wording is chosen badly. It should not send any
>> credentials, but the code (as in this example by GCS).
>> >>>>>>>
>> >>>>>>>> I think Jack makes a good point with AWS SigV4 Authentication. I
>> suppose, in REST Catalog implementations that support that auth method, the
>> /v1/oauth/token Catalog REST endpoint is redundant.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> There are other cloud providers next to AWS.
>> >>>>>>>
>> >>>>>>> Kind regards,
>> >>>>>>> Fokko
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Op do 23 mei 2024 om 15:49 schreef Dmitri Bourlatchkov
>> <dmitri.bourlatch...@dremio.com.invalid>:
>> >>>>>>>>
>> >>>>>>>> I think Jack makes a good point with AWS SigV4 Authentication. I
>> suppose, in REST Catalog implementations that support that auth method, the
>> /v1/oauth/token Catalog REST endpoint is redundant.
>> >>>>>>>>
>> >>>>>>>> Cheers,
>> >>>>>>>> Dmitri.
>> >>>>>>>>
>> >>>>>>>> On Thu, May 23, 2024 at 9:20 AM Jack Ye <yezhao...@gmail.com>
>> wrote:
>> >>>>>>>>>
>> >>>>>>>>> I do not know enough details about OAuth to make comments about
>> this issue, but just regarding the statement "OAuth2 is the only mechanism
>> supported by the Iceberg client", AWS Sigv4 auth is also supported, at
>> least in the Java client implementation. It would be nice if we formalize
>> that in the spec, at least define it as a generic authorization header.
>> >>>>>>>>>
>> >>>>>>>>> Best,
>> >>>>>>>>> Jack Ye
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Thu, May 23, 2024 at 2:51 AM Robert Stupp <sn...@snazy.de>
>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> Hi all,
>> >>>>>>>>>>
>> >>>>>>>>>> Iceberg REST implementations, either accessible on the public
>> internet
>> >>>>>>>>>> or inside an organization, are usually being secured using
>> appropriate
>> >>>>>>>>>> authorization mechanisms. The Nessie team is looking at
>> implementing the
>> >>>>>>>>>> Iceberg REST specification and have some questions around the
>> security
>> >>>>>>>>>> endpoint(s) defined in the spec.
>> >>>>>>>>>>
>> >>>>>>>>>> TL;DR we have questions (potentially concerns) about having the
>> >>>>>>>>>> ‘/v1/oauth/tokens’ endpoint, for the reasons explained below.
>> We think
>> >>>>>>>>>> that ‘/v1/oauth/tokens’ poses potential security and OAuth2
>> compliance
>> >>>>>>>>>> issues, and imposes how authorization should be implemented.
>> >>>>>>>>>> * As an open table format, it would be good for Iceberg to
>> focus on the
>> >>>>>>>>>> table format / catalog and not how authorization is
>> implemented. The
>> >>>>>>>>>> existence of an OAuth endpoint pushes implementations to adopt
>> >>>>>>>>>> authorization using only OAuth, whereas the implementers might
>> choose
>> >>>>>>>>>> several other ways to implement authorization (e.g. SAML). In
>> our
>> >>>>>>>>>> opinion the spec should leave it open to the implementation to
>> decide
>> >>>>>>>>>> how authorization will be implemented.
>> >>>>>>>>>> * The existence of that endpoint also pushes operators of
>> Iceberg REST
>> >>>>>>>>>> endpoints into the authorization service business.
>> >>>>>>>>>> * Clients might expose their clear-text credentials to the
>> wrong
>> >>>>>>>>>> service, if the (correct) OAuth endpoint is not configured
>> (humans do
>> >>>>>>>>>> make mistakes).
>> >>>>>>>>>> * (Naive) Iceberg REST servers may proxy requests received for
>> >>>>>>>>>> ‘/v1/oauth/tokens’ - and effectively become a
>> “man-in-the-middle”, which
>> >>>>>>>>>> is not fully compliant with the OAuth 2.0 specification.
>> >>>>>>>>>>
>> >>>>>>>>>> Our goals with this discussion are:
>> >>>>>>>>>> 1. Secure the Iceberg REST specification by preventing
>> accidental
>> >>>>>>>>>> misuse/misimplementation.
>> >>>>>>>>>> 2. Prevent that Iceberg REST to get into dictating the
>> “authorization
>> >>>>>>>>>> server specifics”.
>> >>>>>>>>>> 3. Enable flexibility for Iceberg REST servers to opt for other
>> >>>>>>>>>> authorization mechanisms than OAuth 2.0.
>> >>>>>>>>>> 4. Enable REST servers to opt for integrating with any
>> standard OAuth2 /
>> >>>>>>>>>> OIDC provider (e.g. Okta, Keycloak, Authelia).
>> >>>>>>>>>>
>> >>>>>>>>>> OAuth 2.0 [1] is one of the common standards accepted in the
>> industry.
>> >>>>>>>>>> It defines a secure mechanism to access resources (here:
>> Iceberg REST
>> >>>>>>>>>> endpoints). The most important aspect for OAuth 2.0 resources
>> is that
>> >>>>>>>>>> (Iceberg REST) servers do not (have to) support password
>> authentication,
>> >>>>>>>>>> especially considering the security weaknesses inherent in
>> passwords.
>> >>>>>>>>>> Compromised passwords result in compromised data protected by
>> that password.
>> >>>>>>>>>>
>> >>>>>>>>>> Therefore OAuth 2.0 defines a set of strict rules. Some of
>> these are:
>> >>>>>>>>>> * Credentials (for example username/password) must _never_ be
>> sent to
>> >>>>>>>>>> the resource server, only to the authorization server.
>> >>>>>>>>>> * OAuth 2.0 refresh tokens must _never_ be sent to the
>> resource server,
>> >>>>>>>>>> only to the authorization server. (“Unlike access tokens,
>> refresh tokens
>> >>>>>>>>>> are intended for use only with authorization servers and are
>> never sent
>> >>>>>>>>>> to resource servers.”, cite from section 1.5 of the OAuth RFC
>> 6749.)
>> >>>>>>>>>> * While the OAuth RFC states "The authorization server may be
>> the same
>> >>>>>>>>>> server as the resource server or a separate entity", this
>> should not be
>> >>>>>>>>>> mandated. i.e the spec should be open to supporting
>> implementations that
>> >>>>>>>>>> have the authorization server and resource server co-located
>> as well as
>> >>>>>>>>>> separate.
>> >>>>>>>>>>
>> >>>>>>>>>> The Iceberg PR 4771 [2] added the OpenAPI path
>> ‘/v1/oauth/tokens’,
>> >>>>>>>>>> intentionally marked to “To exchange client credentials
>> (client ID and
>> >>>>>>>>>> secret) for an access token. This uses the client credentials
>> flow.”
>> >>>>>>>>>> [3]. Technically: client ID and secret are submitted using a
>> HTTP POST
>> >>>>>>>>>> request to that Iceberg REST endpoint.
>> >>>>>>>>>>
>> >>>>>>>>>> Having ‘/v1/oauth/tokens’ in the Iceberg REST specification
>> can easily
>> >>>>>>>>>> be seen as a hard requirement. In order to implement this in
>> compliance
>> >>>>>>>>>> with the OAuth 2.0 spec, that ‘/v1/oauth/tokens’ MUST be the
>> >>>>>>>>>> authorization server. If users do not (want to) implement an
>> >>>>>>>>>> authorization server, the only way to implement this
>> ‘/v1/oauth/tokens’
>> >>>>>>>>>> endpoint would be to proxy ‘/v1/oauth/tokens’ to the actual
>> >>>>>>>>>> authorization server, which means, that this proxy technically
>> becomes a
>> >>>>>>>>>> “man in the middle” - knowing both all credentials and all
>> involved tokens.
>> >>>>>>>>>>
>> >>>>>>>>>> Even if an Iceberg REST server does not implement the
>> ‘/v1/oauth/tokens’
>> >>>>>>>>>> endpoint, it can still receive requests to ‘/v1/oauth/tokens’
>> containing
>> >>>>>>>>>> clear text credentials, if clients are misconfigured (humans
>> do make
>> >>>>>>>>>> mistakes) - it’s a non-zero risk - bad actors can
>> implement/intercept
>> >>>>>>>>>> that  ‘/v1/oauth/tokens’ endpoint and just wait for
>> misconfigured
>> >>>>>>>>>> clients to send credentials.
>> >>>>>>>>>>
>> >>>>>>>>>> Further usages of the REST Catalog API path ‘/v1/oauth/tokens’
>> are “To
>> >>>>>>>>>> exchange a client token and an identity token for a more
>> specific access
>> >>>>>>>>>> token. This uses the token exchange flow.” and “To exchange an
>> access
>> >>>>>>>>>> token for one with the same claims and a refreshed expiration
>> period
>> >>>>>>>>>> This uses the token exchange flow.” Both usages should and can
>> be
>> >>>>>>>>>> implemented differently.
>> >>>>>>>>>>
>> >>>>>>>>>> Apache Iceberg, as a table format project, should recommend
>> protecting
>> >>>>>>>>>> sensitive information. But Iceberg should not mandate _how_
>> that
>> >>>>>>>>>> protection is implemented - but the Iceberg REST specification
>> does
>> >>>>>>>>>> effectively mandate OAuth 2.0, because other Iceberg REST
>> endpoints do
>> >>>>>>>>>> refer/require OAuth 2.0 specifics. Users that want to use other
>> >>>>>>>>>> mechanisms, because they are forced to do so by their
>> organization,
>> >>>>>>>>>> would be locked out of Iceberg REST.
>> >>>>>>>>>>
>> >>>>>>>>>> Apache Iceberg should not mandate OAuth 2.0 as the only option
>> - for the
>> >>>>>>>>>> sake of openness for the project and flexibility for the server
>> >>>>>>>>>> implementations.
>> >>>>>>>>>>
>> >>>>>>>>>> We think that Apache Iceberg REST Catalog spec should not
>> mandate that a
>> >>>>>>>>>> catalog implementation responds to requests to produce Auth
>> Tokens
>> >>>>>>>>>> (since the REST spec v1 defines a /v1/tokens endpoint, current
>> >>>>>>>>>> implementations have to take deliberate actions when
>> responding to those
>> >>>>>>>>>> requests, whether with successful token responses or with
>> “access
>> >>>>>>>>>> denied” or “unsupported” responses).
>> >>>>>>>>>>
>> >>>>>>>>>> We propose the following actions:
>> >>>>>>>>>> 1. Immediate mitigation:
>> >>>>>>>>>> 1.1. Remove the ‘/v1/oauth/tokens’ endpoint entirely from the
>> Iceberg’s
>> >>>>>>>>>> OpenAPI spec w/o replacement.
>> >>>>>>>>>> 1.2. As long as OAuth2 is the only mechanism supported by the
>> Iceberg
>> >>>>>>>>>> client, make the existing client parameter “oauth2-server-uri”
>> >>>>>>>>>> mandatory. The Iceberg REST catalog must fail to initialize if
>> the
>> >>>>>>>>>> “oauth2-server-uri” parameter is not defined.
>> >>>>>>>>>> 1.3. Remove all fallbacks to the ‘/v1/oauth/tokens’ endpoint.
>> >>>>>>>>>> 1.4. Forbid or discourage the communication of tokens from any
>> Iceberg
>> >>>>>>>>>> REST Catalog endpoint, both via the "token" property or with
>> any of the
>> >>>>>>>>>> "urn:ietf:params:oauth:token-type:*" properties.
>> >>>>>>>>>> 2. As a follow up: We’d propose a couple of implementation
>> fixes and
>> >>>>>>>>>> changes and test improvements.
>> >>>>>>>>>> 3. As a follow up: Define a discovery mechanism for both the
>> Iceberg
>> >>>>>>>>>> REST base URI and OAuth 2.0 endpoints/discovery, which allows
>> users to
>> >>>>>>>>>> use a single URI to securely access Iceberg REST endpoints.
>> >>>>>>>>>> 4. As a follow up: Not new, but we also want to improve the
>> Iceberg REST
>> >>>>>>>>>> specification via the “new” REST proposal.
>> >>>>>>>>>>
>> >>>>>>>>>> We do not think that adding recommendations to
>> inline-documentation is
>> >>>>>>>>>> enough to fully mitigate the above concerns.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> References:
>> >>>>>>>>>>
>> >>>>>>>>>> [1] RFC 6749 - The OAuth 2.0 Authorization Framework,
>> >>>>>>>>>> https://datatracker.ietf.org/doc/html/rfc6749
>> >>>>>>>>>> [2] Iceberg pull request 4771 - Core: Add OAuth2 to REST
>> catalog spec -
>> >>>>>>>>>> https://github.com/apache/iceberg/pull/4771
>> >>>>>>>>>> [3] Iceberg pull request 4843 - Spec: Add more context about
>> OAuth2 to
>> >>>>>>>>>> the REST spec - https://github.com/apache/iceberg/pull/4843
>> >>>>>>>>>>
>> >>>>>>>>>> --
>> >>>>>>>>>> Robert Stupp
>> >>>>>>>>>> @snazy
>> >>>>>>>>>>
>>
>

Reply via email to