I feel there's general openness to removing the specific OAuth2 portions of the REST Spec as they're largely duplicative of what's covered in the OAuth2 RFC.
However, I do think we need a more complete discussion around client reference implementation and I would be opposed to making any changes to the spec until that is resolved. We also need to consider implications for deprecation cycles (as opposed to just removing existing spec/behaviors). Robert, there's an improvement proposal process <https://iceberg.apache.org/contribute/#apache-iceberg-improvement-proposals> to capture the discussion and proposed changes and I think we should gather the discussion there. -Dan On Wed, May 29, 2024 at 1:02 PM Robert Stupp <sn...@snazy.de> wrote: > Jack's proposal is pretty much along what we've been thinking of. > > We can help with OAuth 2 client implementations for Java that support > bearer, client-credentials, password as well as authorization code and > device code flows (see docs [1] and implementation [2]. All implementations > are built via NessieAutnenticationProvider [3] implementations. I'm sure we > can figure out a way to use the code both in Nessie and Iceberg (it's about > 7k lines of code just for oauth2 including tests). Technically those are > all HTTP "request/response interceptors". > > For 1), I went ahead and opened a draft PR [4] > > Regarding the "backwards compatibility", I think it is better to be > explicit and always require the "oauth-server-uri" parameter. We can make > the implementation deal with relative URIs, which would be resolved against > the given Iceberg-REST base "uri" parameter value. This prevents > Iceberg-REST users from accidentally exposing their client-ID & > client-secret to Iceberg-REST service providers. > > IMO we should also make sure that whatever is implemented works against > major OAuth2 providers/servers, for example Keycloak, Authelia, Authentik, > Okta, and have integration tests to validate the implementations. > > Also +1 on having a "REST catalog authentication spec". We can list all > known auth mechanisms there, including recommended config option keys and > possible values. However, implementations may differ across languages > (Java, Python, Rust, Go), so that might only serve as a > guideline/recommendation, not as a strict requirement. This can also serve > as documentation/reference of what each client (language/version) supports > and how. > > Robert > > > [1] https://projectnessie.org/nessie-latest/client_config/#oauth2-settings > > [2] > https://github.com/projectnessie/nessie/tree/main/api/client/src/main/java/org/projectnessie/client/auth/oauth2 > > [3] > https://github.com/projectnessie/nessie/blob/main/api/client/src/main/java/org/projectnessie/client/auth/NessieAuthenticationProvider.java > > [4] https://github.com/apache/iceberg/pull/10398 > On 29.05.24 20:03, Jack Ye wrote: > > Just to reiterate my points discussed in the community sync here: the more > I think about it the more I agree the OAuth endpoint *should be removed > from the REST spec*. Even though the endpoint is optional, and even if we > do not care about the security concerns, it still provides users an > impression that the endpoint "should" be implemented, or "is the preferred > authentication mechanism". And as we have found out, the server capability > proposal does not cover this case since this is the first endpoint to hit > before the GetConfig endpoint. > > As Ryan said, if we want to do that we need an alternative plan. I don't > have anything concrete, but here is my line of thought: > > 1. remove OAuth2 endpoint from the "REST OpenAPI spec" > > 2. create a client-side interface (in each language) that different > authentication mechanisms can be plugged in to talk to the REST catalog > > 3. refactor and make OAuth2 an implementation of that interface. I can > also help with doing the same for AWS Sigv4, and the community can further > support some additional ones like Kerberos, SAML, Google SSO, etc. based on > the individual use cases. > > 4. turn 2 + 3 into a "REST catalog authentication spec" that documents > about all the supported authentication mechanisms and their defaults. For > OAuth2, the default is to have the auth server at the same endpoint as the > resource server for backwards compatibility, but that is a configurable > property, and we could recommend not to do that based on security concerns. > > Best, > Jack Ye > > On Wed, May 29, 2024 at 10:28 AM Steven Wu <stevenz...@gmail.com> wrote: > >> Wondering if the auth endpoints can be separated out to a separate >> OpenAPI spec file. Then we still have some reference for interactions with >> auth server and make it clear it is not required as part of the REST >> catalog server. In most enterprise environments, auth server is likely a >> separate server. >> >> On Tue, May 28, 2024 at 1:25 PM Alex Dutra >> <alex.du...@dremio.com.invalid> <alex.du...@dremio.com.invalid> wrote: >> >>> Hi, >>> >>> >>>> On point 4, isn't that possible today, Can't that be achieved with the >>>> current token exchange approach, and the internal implementation of the >>>> endpoint? >>> >>> >>> Unfortunately, no. Token exchange is not widely adopted yet: for >>> example, Keycloak has only partial support for it, and Authelia, or >>> Authentik, have no support for it at all. >>> >>> This, and a few other technical issues with the current internals of the >>> REST client, makes it nearly impossible to achieve a good integration of >>> Iceberg REST with the majority of popular OSS authorization servers. >>> >>> I am planning to start another email thread to discuss these >>> practicalities, but let's first reach consensus on the broader security >>> issues voiced here, before we tackle the details. >>> >>> Thanks, >>> >>> Alex Dutra >>> >>> On Tue, May 28, 2024 at 8:41 PM Amogh Jahagirdar <am...@tabular.io> >>> wrote: >>> >>>> I disagree with removing "/v1/oauth/tokens" and I think I also disagree >>>> with the premise that implementing that endpoint is required, but I can >>>> understand how that's not clear in the spec. I think we can address the >>>> required vs non-required discussion with the capabilities PR. >>>> <https://github.com/apache/iceberg/pull/9940> >>>> >>>> It seems like another part of what's driving this discussion is some >>>> concern around how do we enforce REST catalog implementations which do >>>> implement this endpoint to make sure that the implementation is secure (for >>>> example to avoid the MITM example that was brought up). This is ultimately >>>> a runtime detail. To me it seems like if we make it clear that such an >>>> endpoint should be implemented respecting OAuth2 standards, and we know >>>> that OAuth2 compliance requires avoiding that MITM situation, then runtime >>>> implementations should just follow the spec there >>>> >>>> >3. Enable flexibility for Iceberg REST servers to opt for other >>>> authorization mechanisms than OAuth 2.0. >>>> >4. Enable REST servers to opt for integrating with any standard OAuth2 >>>> / >>>> OIDC provider (e.g. Okta, Keycloak, Authelia). >>>> >>>> I agree with both of these points; again I don't think the intention is >>>> Oauth2 is the only way, but I think the capabilities PR will make that even >>>> more clear. >>>> On point 4, isn't that possible today, Can't that be achieved with the >>>> current token exchange approach, and the internal implementation of the >>>> endpoint? Sorry if I missed that explanation. >>>> >>>> Thanks, >>>> >>>> Amogh Jahagirdar >>>> >>>> On Tue, May 28, 2024 at 11:13 AM Yufei Gu <flyrain...@gmail.com> wrote: >>>> >>>>> Not an expert on authentication, but reading from the context, I agree >>>>> that it’s not a good practice to use a resource server as a token server. >>>>> The resource server would need to securely handle and store credentials or >>>>> tokens, increasing the risk of credential theft or leakage. Making the >>>>> token endpoint optional will mitigate the issue a bit. But if we want to >>>>> disable it completely, it's better to do it now to prevent any issues and >>>>> migration costs in the future. Can we have a consensus on it? >>>>> >>>>> >>>>> I would prefer to deprecate it to prevent any intentional and >>>>> unintentional misuse. We will also need to change the clients since it >>>>> connects to the endpoint by default. >>>>> >>>>> >>>>> Yufei >>>>> >>>>> >>>>> On Tue, May 28, 2024 at 8:27 AM Jack Ye <yezhao...@gmail.com> wrote: >>>>> >>>>>> Sounds like we should try to finalize a consensus around >>>>>> https://github.com/apache/iceberg/pull/9940, so that we make it very >>>>>> clear what APIs/features are optional. >>>>>> >>>>>> -Jack >>>>>> >>>>>> On Tue, May 28, 2024 at 7:25 AM Fokko Driesprong <fo...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Hey Robert, >>>>>>> >>>>>>> Sorry for the late reply as I was out last week. I'm not an OAuth >>>>>>> guru either, but some context from my end. >>>>>>> >>>>>>> * Credentials (for example username/password) must _never_ be sent to >>>>>>>> the resource server, only to the authorization server. >>>>>>> >>>>>>> >>>>>>> In an earlier discussion >>>>>>> <https://github.com/apache/iceberg/pull/8976>, it was agreed that >>>>>>> the resource server can also function as the authorization server. But >>>>>>> the >>>>>>> roles can also be separate. >>>>>>> >>>>>>> 1.2. As long as OAuth2 is the only mechanism supported by the Iceberg >>>>>>>> client, make the existing client parameter “oauth2-server-uri” >>>>>>>> mandatory. The Iceberg REST catalog must fail to initialize if the >>>>>>>> “oauth2-server-uri” parameter is not defined. >>>>>>> >>>>>>> >>>>>>> It can also be that there is no authentication in the case of an >>>>>>> internal REST catalog. For example, the iceberg-rest-image >>>>>>> <https://github.com/tabular-io/iceberg-rest-image> that we use for >>>>>>> integration tests in PyIceberg. >>>>>>> >>>>>>> We think that Apache Iceberg REST Catalog spec should not mandate >>>>>>>> that a >>>>>>>> catalog implementation responds to requests to produce Auth Tokens >>>>>>>> (since the REST spec v1 defines a /v1/tokens endpoint, current >>>>>>>> implementations have to take deliberate actions when responding to >>>>>>>> those >>>>>>>> requests, whether with successful token responses or with “access >>>>>>>> denied” or “unsupported” responses). >>>>>>> >>>>>>> The `/v1/tokens` endpoint is optional >>>>>>> <https://github.com/apache/iceberg-python/blob/756ae625a2ea0f9c12df78430512ce991f6a1976/pyiceberg/catalog/rest.py#L488-L489> >>>>>>> . >>>>>>> >>>>>>> * Credentials (for example username/password) must _never_ be sent to >>>>>>>> the resource server, only to the authorization server. >>>>>>> >>>>>>> >>>>>>> I fully agree! >>>>>>> >>>>>>> Even if an Iceberg REST server does not implement the >>>>>>>> ‘/v1/oauth/tokens’ >>>>>>>> endpoint, it can still receive requests to ‘/v1/oauth/tokens’ >>>>>>>> containing >>>>>>>> clear text credentials, if clients are misconfigured (humans do make >>>>>>>> mistakes) - it’s a non-zero risk - bad actors can >>>>>>>> implement/intercept >>>>>>>> that ‘/v1/oauth/tokens’ endpoint and just wait for misconfigured >>>>>>>> clients to send credentials. >>>>>>> >>>>>>> >>>>>>> I think the wording is chosen badly. It should not send >>>>>>> any credentials, but the code (as in this example >>>>>>> <https://developers.google.com/identity/protocols/oauth2#installed> by >>>>>>> GCS). >>>>>>> >>>>>>> I think Jack makes a good point with AWS SigV4 Authentication. I >>>>>>>> suppose, in REST Catalog implementations that support that auth >>>>>>>> method, the >>>>>>>> /v1/oauth/token Catalog REST endpoint is redundant. >>>>>>>> >>>>>>> >>>>>>> There are other cloud providers next to AWS. >>>>>>> >>>>>>> Kind regards, >>>>>>> Fokko >>>>>>> >>>>>>> >>>>>>> >>>>>>> Op do 23 mei 2024 om 15:49 schreef Dmitri Bourlatchkov >>>>>>> <dmitri.bourlatch...@dremio.com.invalid> >>>>>>> <dmitri.bourlatch...@dremio.com.invalid>: >>>>>>> >>>>>>>> I think Jack makes a good point with AWS SigV4 Authentication. I >>>>>>>> suppose, in REST Catalog implementations that support that auth >>>>>>>> method, the >>>>>>>> /v1/oauth/token Catalog REST endpoint is redundant. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Dmitri. >>>>>>>> >>>>>>>> On Thu, May 23, 2024 at 9:20 AM Jack Ye <yezhao...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I do not know enough details about OAuth to make comments about >>>>>>>>> this issue, but just regarding the statement "OAuth2 is the only >>>>>>>>> mechanism >>>>>>>>> supported by the Iceberg client", AWS Sigv4 auth is also supported, at >>>>>>>>> least in the Java client implementation >>>>>>>>> <https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/HTTPClient.java#L72>. >>>>>>>>> It would be nice if we formalize that in the spec, at least define it >>>>>>>>> as a >>>>>>>>> generic authorization header. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Jack Ye >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, May 23, 2024 at 2:51 AM Robert Stupp <sn...@snazy.de> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> Iceberg REST implementations, either accessible on the public >>>>>>>>>> internet >>>>>>>>>> or inside an organization, are usually being secured using >>>>>>>>>> appropriate >>>>>>>>>> authorization mechanisms. The Nessie team is looking at >>>>>>>>>> implementing the >>>>>>>>>> Iceberg REST specification and have some questions around the >>>>>>>>>> security >>>>>>>>>> endpoint(s) defined in the spec. >>>>>>>>>> >>>>>>>>>> TL;DR we have questions (potentially concerns) about having the >>>>>>>>>> ‘/v1/oauth/tokens’ endpoint, for the reasons explained below. We >>>>>>>>>> think >>>>>>>>>> that ‘/v1/oauth/tokens’ poses potential security and OAuth2 >>>>>>>>>> compliance >>>>>>>>>> issues, and imposes how authorization should be implemented. >>>>>>>>>> * As an open table format, it would be good for Iceberg to focus >>>>>>>>>> on the >>>>>>>>>> table format / catalog and not how authorization is implemented. >>>>>>>>>> The >>>>>>>>>> existence of an OAuth endpoint pushes implementations to adopt >>>>>>>>>> authorization using only OAuth, whereas the implementers might >>>>>>>>>> choose >>>>>>>>>> several other ways to implement authorization (e.g. SAML). In our >>>>>>>>>> opinion the spec should leave it open to the implementation to >>>>>>>>>> decide >>>>>>>>>> how authorization will be implemented. >>>>>>>>>> * The existence of that endpoint also pushes operators of Iceberg >>>>>>>>>> REST >>>>>>>>>> endpoints into the authorization service business. >>>>>>>>>> * Clients might expose their clear-text credentials to the wrong >>>>>>>>>> service, if the (correct) OAuth endpoint is not configured >>>>>>>>>> (humans do >>>>>>>>>> make mistakes). >>>>>>>>>> * (Naive) Iceberg REST servers may proxy requests received for >>>>>>>>>> ‘/v1/oauth/tokens’ - and effectively become a >>>>>>>>>> “man-in-the-middle”, which >>>>>>>>>> is not fully compliant with the OAuth 2.0 specification. >>>>>>>>>> >>>>>>>>>> Our goals with this discussion are: >>>>>>>>>> 1. Secure the Iceberg REST specification by preventing accidental >>>>>>>>>> misuse/misimplementation. >>>>>>>>>> 2. Prevent that Iceberg REST to get into dictating the >>>>>>>>>> “authorization >>>>>>>>>> server specifics”. >>>>>>>>>> 3. Enable flexibility for Iceberg REST servers to opt for other >>>>>>>>>> authorization mechanisms than OAuth 2.0. >>>>>>>>>> 4. Enable REST servers to opt for integrating with any standard >>>>>>>>>> OAuth2 / >>>>>>>>>> OIDC provider (e.g. Okta, Keycloak, Authelia). >>>>>>>>>> >>>>>>>>>> OAuth 2.0 [1] is one of the common standards accepted in the >>>>>>>>>> industry. >>>>>>>>>> It defines a secure mechanism to access resources (here: Iceberg >>>>>>>>>> REST >>>>>>>>>> endpoints). The most important aspect for OAuth 2.0 resources is >>>>>>>>>> that >>>>>>>>>> (Iceberg REST) servers do not (have to) support password >>>>>>>>>> authentication, >>>>>>>>>> especially considering the security weaknesses inherent in >>>>>>>>>> passwords. >>>>>>>>>> Compromised passwords result in compromised data protected by >>>>>>>>>> that password. >>>>>>>>>> >>>>>>>>>> Therefore OAuth 2.0 defines a set of strict rules. Some of these >>>>>>>>>> are: >>>>>>>>>> * Credentials (for example username/password) must _never_ be >>>>>>>>>> sent to >>>>>>>>>> the resource server, only to the authorization server. >>>>>>>>>> * OAuth 2.0 refresh tokens must _never_ be sent to the resource >>>>>>>>>> server, >>>>>>>>>> only to the authorization server. (“Unlike access tokens, refresh >>>>>>>>>> tokens >>>>>>>>>> are intended for use only with authorization servers and are >>>>>>>>>> never sent >>>>>>>>>> to resource servers.”, cite from section 1.5 of the OAuth RFC >>>>>>>>>> 6749.) >>>>>>>>>> * While the OAuth RFC states "The authorization server may be the >>>>>>>>>> same >>>>>>>>>> server as the resource server or a separate entity", this should >>>>>>>>>> not be >>>>>>>>>> mandated. i.e the spec should be open to supporting >>>>>>>>>> implementations that >>>>>>>>>> have the authorization server and resource server co-located as >>>>>>>>>> well as >>>>>>>>>> separate. >>>>>>>>>> >>>>>>>>>> The Iceberg PR 4771 [2] added the OpenAPI path >>>>>>>>>> ‘/v1/oauth/tokens’, >>>>>>>>>> intentionally marked to “To exchange client credentials (client >>>>>>>>>> ID and >>>>>>>>>> secret) for an access token. This uses the client credentials >>>>>>>>>> flow.” >>>>>>>>>> [3]. Technically: client ID and secret are submitted using a HTTP >>>>>>>>>> POST >>>>>>>>>> request to that Iceberg REST endpoint. >>>>>>>>>> >>>>>>>>>> Having ‘/v1/oauth/tokens’ in the Iceberg REST specification can >>>>>>>>>> easily >>>>>>>>>> be seen as a hard requirement. In order to implement this in >>>>>>>>>> compliance >>>>>>>>>> with the OAuth 2.0 spec, that ‘/v1/oauth/tokens’ MUST be the >>>>>>>>>> authorization server. If users do not (want to) implement an >>>>>>>>>> authorization server, the only way to implement this >>>>>>>>>> ‘/v1/oauth/tokens’ >>>>>>>>>> endpoint would be to proxy ‘/v1/oauth/tokens’ to the actual >>>>>>>>>> authorization server, which means, that this proxy technically >>>>>>>>>> becomes a >>>>>>>>>> “man in the middle” - knowing both all credentials and all >>>>>>>>>> involved tokens. >>>>>>>>>> >>>>>>>>>> Even if an Iceberg REST server does not implement the >>>>>>>>>> ‘/v1/oauth/tokens’ >>>>>>>>>> endpoint, it can still receive requests to ‘/v1/oauth/tokens’ >>>>>>>>>> containing >>>>>>>>>> clear text credentials, if clients are misconfigured (humans do >>>>>>>>>> make >>>>>>>>>> mistakes) - it’s a non-zero risk - bad actors can >>>>>>>>>> implement/intercept >>>>>>>>>> that ‘/v1/oauth/tokens’ endpoint and just wait for misconfigured >>>>>>>>>> clients to send credentials. >>>>>>>>>> >>>>>>>>>> Further usages of the REST Catalog API path ‘/v1/oauth/tokens’ >>>>>>>>>> are “To >>>>>>>>>> exchange a client token and an identity token for a more specific >>>>>>>>>> access >>>>>>>>>> token. This uses the token exchange flow.” and “To exchange an >>>>>>>>>> access >>>>>>>>>> token for one with the same claims and a refreshed expiration >>>>>>>>>> period >>>>>>>>>> This uses the token exchange flow.” Both usages should and can be >>>>>>>>>> implemented differently. >>>>>>>>>> >>>>>>>>>> Apache Iceberg, as a table format project, should recommend >>>>>>>>>> protecting >>>>>>>>>> sensitive information. But Iceberg should not mandate _how_ that >>>>>>>>>> protection is implemented - but the Iceberg REST specification >>>>>>>>>> does >>>>>>>>>> effectively mandate OAuth 2.0, because other Iceberg REST >>>>>>>>>> endpoints do >>>>>>>>>> refer/require OAuth 2.0 specifics. Users that want to use other >>>>>>>>>> mechanisms, because they are forced to do so by their >>>>>>>>>> organization, >>>>>>>>>> would be locked out of Iceberg REST. >>>>>>>>>> >>>>>>>>>> Apache Iceberg should not mandate OAuth 2.0 as the only option - >>>>>>>>>> for the >>>>>>>>>> sake of openness for the project and flexibility for the server >>>>>>>>>> implementations. >>>>>>>>>> >>>>>>>>>> We think that Apache Iceberg REST Catalog spec should not mandate >>>>>>>>>> that a >>>>>>>>>> catalog implementation responds to requests to produce Auth >>>>>>>>>> Tokens >>>>>>>>>> (since the REST spec v1 defines a /v1/tokens endpoint, current >>>>>>>>>> implementations have to take deliberate actions when responding >>>>>>>>>> to those >>>>>>>>>> requests, whether with successful token responses or with “access >>>>>>>>>> denied” or “unsupported” responses). >>>>>>>>>> >>>>>>>>>> We propose the following actions: >>>>>>>>>> 1. Immediate mitigation: >>>>>>>>>> 1.1. Remove the ‘/v1/oauth/tokens’ endpoint entirely from the >>>>>>>>>> Iceberg’s >>>>>>>>>> OpenAPI spec w/o replacement. >>>>>>>>>> 1.2. As long as OAuth2 is the only mechanism supported by the >>>>>>>>>> Iceberg >>>>>>>>>> client, make the existing client parameter “oauth2-server-uri” >>>>>>>>>> mandatory. The Iceberg REST catalog must fail to initialize if >>>>>>>>>> the >>>>>>>>>> “oauth2-server-uri” parameter is not defined. >>>>>>>>>> 1.3. Remove all fallbacks to the ‘/v1/oauth/tokens’ endpoint. >>>>>>>>>> 1.4. Forbid or discourage the communication of tokens from any >>>>>>>>>> Iceberg >>>>>>>>>> REST Catalog endpoint, both via the "token" property or with any >>>>>>>>>> of the >>>>>>>>>> "urn:ietf:params:oauth:token-type:*" properties. >>>>>>>>>> 2. As a follow up: We’d propose a couple of implementation fixes >>>>>>>>>> and >>>>>>>>>> changes and test improvements. >>>>>>>>>> 3. As a follow up: Define a discovery mechanism for both the >>>>>>>>>> Iceberg >>>>>>>>>> REST base URI and OAuth 2.0 endpoints/discovery, which allows >>>>>>>>>> users to >>>>>>>>>> use a single URI to securely access Iceberg REST endpoints. >>>>>>>>>> 4. As a follow up: Not new, but we also want to improve the >>>>>>>>>> Iceberg REST >>>>>>>>>> specification via the “new” REST proposal. >>>>>>>>>> >>>>>>>>>> We do not think that adding recommendations to >>>>>>>>>> inline-documentation is >>>>>>>>>> enough to fully mitigate the above concerns. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> References: >>>>>>>>>> >>>>>>>>>> [1] RFC 6749 - The OAuth 2.0 Authorization Framework, >>>>>>>>>> https://datatracker.ietf.org/doc/html/rfc6749 >>>>>>>>>> [2] Iceberg pull request 4771 - Core: Add OAuth2 to REST catalog >>>>>>>>>> spec - >>>>>>>>>> https://github.com/apache/iceberg/pull/4771 >>>>>>>>>> [3] Iceberg pull request 4843 - Spec: Add more context about >>>>>>>>>> OAuth2 to >>>>>>>>>> the REST spec - https://github.com/apache/iceberg/pull/4843 >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Robert Stupp >>>>>>>>>> @snazy >>>>>>>>>> >>>>>>>>>> -- > Robert Stupp > @snazy > >