Re: [DISCUSS] Describing REST Server capabilities

Jack Ye Wed, 31 Jul 2024 07:07:15 -0700

One thing to clarify, regarding per-endpoint versioning, my understanding
is that endpoint version should bump (e.g. GET /v1/namespaces to GET
/v2/namespaces) when there is a significant backwards incompatible change.


-Jack

On Tue, Jul 30, 2024 at 7:56 PM Jack Ye <yezhao...@gmail.com> wrote:

> > are you talking about the endpoint path like "/v1/"
>
> No, I mean in general the catalog spec version, which is marked currently
> as 0.0.1:
> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L27
>
> And in my mind it would be you periodically release a version of the
> catalog spec, just like you release Iceberg java libraries and pyiceberg.
> And catalog providers will choose to upgrade to that version, and decide if
> anything has changed for their existing support, add new support for new
> features, document new restrictions and limitations. Basically just go
> through a typical product lifecycle when integrating with the spec.
>
> -Jack
>
>
> On Tue, Jul 30, 2024 at 12:24 PM Steven Wu <stevenz...@gmail.com> wrote:
>
>> >  (2) version the entire catalog spec. A released catalog spec version
>> will contain a list of configs it supports, and also a set of APIs and all
>> features embedded in the APIs. A server will report the specific catalog
>> version it adheres to, and then document the nuances.
>>
>> Jack, just to clarify, are you talking about the endpoint path like
>> "/v1/"? Also, does that mean every API/feature addition would require a
>> catalog version bump?
>>
>> On Tue, Jul 30, 2024 at 8:34 AM Jack Ye <yezhao...@gmail.com> wrote:
>>
>>> Since the catalog sync was canceled this week, I find maybe it is better
>>> to reply here for my latest take on this topic.
>>>
>>> I think we have 2 discussions intertwined here, that I would like to
>>> decouple if possible.
>>>
>>> (1) is it worth having a concept of capabilities to control client
>>> behaviors?
>>> (2) suppose we introduce capabilities, is it worth having versioned
>>> capabilities?
>>>
>>> Personally speaking I am currently still more inclined to not have
>>> capabilities. An alternative here is to keep doing what has been done for
>>> metrics API, which is to introduce feature flags like
>>> rest-metrics-reporting-enabled. One strong argument I saw for this
>>> alternative is that a feature flag can express non-binary options. For
>>> capabilities, you are bound to say just whether the server has this
>>> capability or not. But what we really want is to control client behavior
>>> based on the capability. And for that, there could be multiple options for
>>> the client to interact with the server in existence/absence of a feature.
>>> For example, for multi-table commit, there could be 2 different behaviors
>>> when the server does not support the endpoint, (1) fail the operation
>>> early, (2) fallback to use single-table commit for each table.
>>>
>>> And with this alternative, there is of course no versioned capabilities.
>>> But I think the reason we want versioned capabilities is because we want a
>>> general versioning story for the catalog spec with forward and backward
>>> compatibility guarantees. If that is the goal, why not: (1) acknowledge the
>>> feature flag configs as a part of the spec, (2) version the entire catalog
>>> spec. A released catalog spec version will contain a list of configs it
>>> supports, and also a set of APIs and all features embedded in the APIs. A
>>> server will report the specific catalog version it adheres to, and then
>>> document the nuances. I feel this would put catalog providers in a more
>>> comfortable situation, as they now have a stable catalog spec to adhere to
>>> as the basis, that does not just automatically evolve within the same
>>> version. They can implement a catalog spec and upgrade at their own pace
>>> following a common versioning semantics. They will also report whatever
>>> level of support and detailed behaviors they want, without the need to tie
>>> specific behaviors to different capabilities.
>>>
>>> I think we have been spending quite a long time on this topic, but this
>>> is so fundamental that I feel we should think through the alternatives.
>>> Would it be possible to at least document in the design proposal why the
>>> alternatives are not desirable, what are the pros and cons?
>>>
>>> -Jack
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jul 16, 2024 at 5:47 AM Eduard Tudenhöfner <
>>> etudenhoef...@apache.org> wrote:
>>>
>>>> Hey everyone,
>>>>
>>>> I've written up
>>>> https://docs.google.com/document/d/1F1xh6SJhS-opgWRe1pPvWh01j8VHNHRocfCCFttKNf0/edit
>>>>  to
>>>> provide an easier way of giving feedback to the proposal.
>>>> Please take a look so that we can discuss how we'd like to handle the
>>>> default fallback behavior (*tables* vs *everything that's currently in
>>>> the spec*) when a newer client talks to an older server.
>>>>
>>>>
>>>> Eduard
>>>>
>>>> On Mon, Jul 15, 2024 at 4:24 PM Dmitri Bourlatchkov
>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>
>>>>> So I would argue to define the current set of APIs and specs as the
>>>>>> default if the `capabilities` field is missing.
>>>>>
>>>>>
>>>>> There have been two sides to this in prior discussions. Having
>>>>> *tables* as the default vs having what's *currently in the spec* as
>>>>> the default. The argument for having *tables* as the default is
>>>>> because we can't assume that every REST server out there already supports
>>>>> views.
>>>>>
>>>>>
>>>>> Can we assume that a server that does not declare capabilities does
>>>>> NOT implement views? IMHO, that assumption is too strong and will break 
>>>>> use
>>>>> cases when the client is upgraded, but the server is not.
>>>>>
>>>>> Before capabilities were introduced, clients used to work in a certain
>>>>> way. I think when the client starts interpreting capabilities, but the
>>>>> server does not declare the capabilities property at all, the client 
>>>>> should
>>>>> (by default) work the same way as when it did not expect capabilities to 
>>>>> be
>>>>> declared.
>>>>>
>>>>>
>>>>> Hence we're opting for the middle ground with *tables* + having a 
>>>>> *configurable
>>>>> fallback mechanism*. Servers that already support views can configure
>>>>> their clients to default to *tables / views*, meaning that no
>>>>> additional (manual) configuration from a client's perspective is required
>>>>> to get table & view behavior.
>>>>>
>>>>>
>>>>> Forcing a server upgrade when users just want to upgrade the client is
>>>>> too much of a burden, I think. Servers and clients are often managed by
>>>>> different groups of people.
>>>>>
>>>>> In the end, IIRC previous posts in this thread correctly, declaring
>>>>> server capabilities is an optimization to allow more efficient / less
>>>>> error-prone client operation. I do not think it should impose additional
>>>>> functional / interoperability requirements on servers.
>>>>>
>>>>> Cheers,
>>>>> Dmitri.
>>>>>
>>>>> On Mon, Jul 15, 2024 at 10:11 AM Eduard Tudenhöfner <
>>>>> etudenhoef...@apache.org> wrote:
>>>>>
>>>>>> Current servers do not send a `capabilities` field at all. You're
>>>>>>> suggesting to use a new `rest-default-capabilities` property to let 
>>>>>>> newer
>>>>>>> clients assume `1`.  Once the table/view/etc-spec capabilities are 
>>>>>>> needed,
>>>>>>> those newer clients would assume table-spec v1. That's wrong IMO.
>>>>>>
>>>>>>
>>>>>> That statement I mentioned only applies to the capabilities that are
>>>>>> currently in the PR and not to *table-spec / view-spec*.
>>>>>>
>>>>>>
>>>>>> I'm not a fan of a `rest-default-capabilities` property at all,
>>>>>>> because every user has to configure it explicitly and correctly
>>>>>>>
>>>>>>
>>>>>> As I mentioned, servers can configure this for *all* of their
>>>>>> clients via the *config* endpoint, so clients wouldn't have to do
>>>>>> this *manually*.
>>>>>>
>>>>>>
>>>>>> So I would argue to define the current set of APIs and specs as the
>>>>>>> default if the `capabilities` field is missing.
>>>>>>
>>>>>>
>>>>>> There have been two sides to this in prior discussions. Having
>>>>>> *tables* as the default vs having what's *currently in the spec* as
>>>>>> the default. The argument for having *tables* as the default is
>>>>>> because we can't assume that every REST server out there already supports
>>>>>> views.
>>>>>>
>>>>>> Hence we're opting for the middle ground with *tables* + having a 
>>>>>> *configurable
>>>>>> fallback mechanism*. Servers that already support views can
>>>>>> configure their clients to default to *tables / views*, meaning that
>>>>>> no additional (manual) configuration from a client's perspective is
>>>>>> required to get table & view behavior.
>>>>>>
>>>>>> Eduard
>>>>>>
>>>>>> On Mon, Jul 15, 2024 at 3:00 PM Robert Stupp <sn...@snazy.de> wrote:
>>>>>>
>>>>>>> Sorry, I don't understand the two suggestions, especially when used
>>>>>>> in combination. Current servers do not send a `capabilities` field at 
>>>>>>> all.
>>>>>>> You're suggesting to use a new `rest-default-capabilities` property to 
>>>>>>> let
>>>>>>> newer clients assume `1`.  Once the table/view/etc-spec capabilities are
>>>>>>> needed, those newer clients would assume table-spec v1. That's wrong 
>>>>>>> IMO.
>>>>>>>
>>>>>>> I'm not a fan of a `rest-default-capabilities` property at all,
>>>>>>> because every user has to configure it explicitly and correctly. I 
>>>>>>> predict
>>>>>>> quite some users not doing this or not doing it correctly, causing some
>>>>>>> trouble that can be prevented. The way things are configured is already
>>>>>>> quite complex, and yet adding another option adds more complexity to
>>>>>>> Iceberg. So I would argue to define the current set of APIs and specs as
>>>>>>> the default if the `capabilities` field is missing.
>>>>>>>
>>>>>>> Just because the *current* implementation doesn't use
>>>>>>> table-spec/view-spec doesn't mean near future clients would need it -
>>>>>>> table-spec v3 isn't that far away. And with new data types, view-spec v2
>>>>>>> isn't far away either.
>>>>>>>
>>>>>>> Adding table-spec + view-spec capabilities now saves a lot of
>>>>>>> headaches for Iceberg users in the near future.
>>>>>>>
>>>>>>>
>>>>>>> On 15.07.24 11:27, Eduard Tudenhöfner wrote:
>>>>>>>
>>>>>>> I would suggest adding *table-spec / view-spec / udf-spec *capabilities
>>>>>>> later when new requirements/updates get added. The current 
>>>>>>> implementation
>>>>>>> wouldn't make any use of these capabilities, so I don't see a good 
>>>>>>> enough
>>>>>>> reason to add them at this point.
>>>>>>>
>>>>>>> The PR currently says: "tables -> default capability in case the
>>>>>>>> `capabilities` property doesn't exist or is empty in the response" -
>>>>>>>> meaning: the server would _only_ support tables. This phrase in the 
>>>>>>>> spec
>>>>>>>> proposal effectively removes the view functionality from all currently
>>>>>>>> existing Iceberg REST implementations.
>>>>>>>
>>>>>>>
>>>>>>> This is why the configurable fallback mechanism was mentioned in the
>>>>>>> Catalog sync, which can be realized with *r*
>>>>>>> *est-default-capabilities=tables,views,abc,xyz* (all of them
>>>>>>> defaulting to version 1). A server could send that property via the 
>>>>>>> config
>>>>>>> route without having clients to change anything.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jul 15, 2024 at 10:24 AM Robert Stupp <sn...@snazy.de>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I still have concerns regarding the missing table-spec/view-spec
>>>>>>>> capabilities. Newer clients can send create/update requests with
>>>>>>>> requirements/updates of newer Iceberg table/view/udf specs to a server 
>>>>>>>> that
>>>>>>>> doesn't support those spec versions - the outcome is rather undefined. 
>>>>>>>> What
>>>>>>>> should a server do? Ignore the unknown fields and requirement/update 
>>>>>>>> types
>>>>>>>> and hence do what it's potentially _not_ supposed to do? Reply with a 
>>>>>>>> then
>>>>>>>> ambiguous 501 (is it the endpoint that's not implemented or the request
>>>>>>>> content not supported)? Similar, what if a server decides to not 
>>>>>>>> support
>>>>>>>> for example table-spec v1 and just drop the manifest-file list in a 
>>>>>>>> table
>>>>>>>> snapshot leading to data loss?
>>>>>>>>
>>>>>>>> IMO capabilities must contain the table/view/... spec versions
>>>>>>>> supported by the server.
>>>>>>>>
>>>>>>>> There's also the concern about the behavior if the `capabilties`
>>>>>>>> field is missing (see
>>>>>>>> https://github.com/apache/iceberg/pull/9940/files#r1676113409, not
>>>>>>>> sure why the comment thread's resolved). The PR currently says: 
>>>>>>>> "tables ->
>>>>>>>> default capability in case the `capabilities` property doesn't exist 
>>>>>>>> or is
>>>>>>>> empty in the response" - meaning: the server would _only_ support 
>>>>>>>> tables.
>>>>>>>> This phrase in the spec proposal effectively removes the view 
>>>>>>>> functionality
>>>>>>>> from all currently existing Iceberg REST implementations.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11.07.24 08:42, Eduard Tudenhöfner wrote:
>>>>>>>>
>>>>>>>> Are there any other concerns with the proposal or should we start a
>>>>>>>> VOTE thread?
>>>>>>>>
>>>>>>>> Eduard
>>>>>>>>
>>>>>>>> On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid>
>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>>>
>>>>>>>>> Re: remote signing, I agree that it does not look like a server
>>>>>>>>>> capability that a client can / should discover. It is more like 
>>>>>>>>>> something
>>>>>>>>>> that the server instructs / configures the client to do.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> While a server can control this behavior and instruct the client
>>>>>>>>> to use remote signing, technically nothing is preventing a client from
>>>>>>>>> configuring s3.remote-signing-enabled=true. In such a case it
>>>>>>>>> seems more appropriate to indicate that this capability isn't 
>>>>>>>>> supported
>>>>>>>>> rather than a generic 501, because not every server will support 
>>>>>>>>> remote
>>>>>>>>> signing.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Good point regarding clients taking initiative and using request
>>>>>>>>> singing without an explicit server-provided config. It moves the 
>>>>>>>>> client
>>>>>>>>> operations into a mode where the server has more control (over having
>>>>>>>>> longer term client-side credentials), so it looks like a reasonable 
>>>>>>>>> mode to
>>>>>>>>> support from the security perspective.
>>>>>>>>>
>>>>>>>>> Let's keep that capability flag.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Dmitri.
>>>>>>>>>
>>>>>>>>> On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner <
>>>>>>>>> etudenhoef...@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hey everyone,
>>>>>>>>>>
>>>>>>>>>> I've added a few inline comments below.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Re: remote signing, I agree that it does not look like a server
>>>>>>>>>>> capability that a client can / should discover. It is more like 
>>>>>>>>>>> something
>>>>>>>>>>> that the server instructs / configures the client to do.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> While a server can control this behavior and instruct the client
>>>>>>>>>> to use remote signing, technically nothing is preventing a client 
>>>>>>>>>> from
>>>>>>>>>> configuring s3.remote-signing-enabled=true. In such a case it
>>>>>>>>>> seems more appropriate to indicate that this capability isn't 
>>>>>>>>>> supported
>>>>>>>>>> rather than a generic 501, because not every server will support 
>>>>>>>>>> remote
>>>>>>>>>> signing.
>>>>>>>>>>
>>>>>>>>>> The *vended-credentials* capability on the other hand is more
>>>>>>>>>> informative in its nature and a server indeed configures a client. I 
>>>>>>>>>> think
>>>>>>>>>> that was also one of the reasons I removed this capability but added 
>>>>>>>>>> it
>>>>>>>>>> later back due to a comment from Jack.
>>>>>>>>>>
>>>>>>>>>> I'm ok either way in terms of removing / keeping
>>>>>>>>>> *vended-credentials* as a capability but given that we'd want to
>>>>>>>>>> include *actionable* capabilities at this point, I'd just remove
>>>>>>>>>> it (nothing is preventing us from adding it later if necessary).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In that case, why do we need all these other capabilities like
>>>>>>>>>>> tables, remote-signing, etc. in the first place?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Given that capabilities also carry versioning information,
>>>>>>>>>> clients can make more informed decisions on which endpoints to call. 
>>>>>>>>>> One
>>>>>>>>>> could argue that generally throwing a 501 on everything that isn't
>>>>>>>>>> supported might be sufficient, but that doesn't necessarily help a 
>>>>>>>>>> client
>>>>>>>>>> in knowing which versions of a capability are safe to call/use.
>>>>>>>>>>
>>>>>>>>>> Regarding the control of client-side fallback behavior:
>>>>>>>>>> I think the default fallback behavior should be *tables* (with
>>>>>>>>>> version 1) with a property in the REST catalog that allows 
>>>>>>>>>> configuring this
>>>>>>>>>> to e.g. *rest-default-capabilities=tables,views,abc,xyz* (all of
>>>>>>>>>> them defaulting to version 1).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Eduard
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 9, 2024 at 7:00 PM Jack Ye <yezhao...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Yes I agree that sounds like a valid use case. So the criteria
>>>>>>>>>>> so far is that capabilities are used for:
>>>>>>>>>>> - controlling client-side fallback behavior
>>>>>>>>>>> - failing expensive operations early if we know it will
>>>>>>>>>>> eventually fail due to missing capability
>>>>>>>>>>>
>>>>>>>>>>> Do we agree if this is the criteria we should use? What about
>>>>>>>>>>> the other capabilities, namly tables, remote-signing, 
>>>>>>>>>>> credential-vending?
>>>>>>>>>>>
>>>>>>>>>>> -Jack
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue
>>>>>>>>>>> <b...@databricks.com.invalid> <b...@databricks.com.invalid>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> > does it make a difference if I declare the capability or not?
>>>>>>>>>>>>
>>>>>>>>>>>> I think that it does in other cases. Multi-table commits, for
>>>>>>>>>>>> example, are a building block for multi-statement transactions. If 
>>>>>>>>>>>> a
>>>>>>>>>>>> service doesn't support multi-table commits then we ideally want 
>>>>>>>>>>>> clients to
>>>>>>>>>>>> know that ahead of time so that they don't run a big transaction 
>>>>>>>>>>>> and then
>>>>>>>>>>>> fail because the commit is not supported.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov
>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid>
>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Re: remote signing, I agree that it does not look like a
>>>>>>>>>>>>> server capability that a client can / should discover. It is more 
>>>>>>>>>>>>> like
>>>>>>>>>>>>> something that the server instructs / configures the client to do.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye <yezhao...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was reconciling the discussion yesterday, one point that
>>>>>>>>>>>>>> was interesting to me was that we agreed the purpose of these 
>>>>>>>>>>>>>> capabilities
>>>>>>>>>>>>>> is to "control client-side fallback behavior", or at least the 
>>>>>>>>>>>>>> client
>>>>>>>>>>>>>> should behave differently based on these capabilities. However, 
>>>>>>>>>>>>>> this seems
>>>>>>>>>>>>>> to be only needed so far for views, or more specifically, for 
>>>>>>>>>>>>>> loadView API
>>>>>>>>>>>>>> only because it impacts the fallback behavior to resolve the 
>>>>>>>>>>>>>> identifier as
>>>>>>>>>>>>>> a table or not.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For all the other capabilities listed, and even the other
>>>>>>>>>>>>>> endpoints in view, because a server can decide to implement it 
>>>>>>>>>>>>>> partially
>>>>>>>>>>>>>> anyway and just document the behavior, does it make a difference 
>>>>>>>>>>>>>> if I
>>>>>>>>>>>>>> declare the capability or not? The client will not stop the 
>>>>>>>>>>>>>> request, the
>>>>>>>>>>>>>> server will just error out if it is not supported. Maybe the 
>>>>>>>>>>>>>> error is not
>>>>>>>>>>>>>> in the expected code or message, but it is still an error. In 
>>>>>>>>>>>>>> that case,
>>>>>>>>>>>>>> why do we need all these other capabilities like tables, 
>>>>>>>>>>>>>> remote-signing,
>>>>>>>>>>>>>> etc. in the first place?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe it is too extreme of a thought, but could anyone help
>>>>>>>>>>>>>> describe how the other capabilities could be used beyond 
>>>>>>>>>>>>>> potentially
>>>>>>>>>>>>>> returning an error earlier?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Jack
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov
>>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid>
>>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Eduard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > I've also added the 501 error to the response of the
>>>>>>>>>>>>>>> respective endpoints but worth mentioning that *HEAD* / *GET
>>>>>>>>>>>>>>> *requests must not return a 501
>>>>>>>>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> 
>>>>>>>>>>>>>>> (this
>>>>>>>>>>>>>>> implies that the server impl would e.g. return a *404* in
>>>>>>>>>>>>>>> such a case).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My reading on the Mozilla page makes me think that it is
>>>>>>>>>>>>>>> phrased too narrowly. Reading RFC 2616 [1] I believe that it 
>>>>>>>>>>>>>>> does not
>>>>>>>>>>>>>>> preclude responding with 501 to GET and HEAD requests. I think 
>>>>>>>>>>>>>>> it means
>>>>>>>>>>>>>>> that GET and HEAD methods must be supported by "general 
>>>>>>>>>>>>>>> purpose" servers.
>>>>>>>>>>>>>>> The Iceberg REST server is not a general purpose server for 
>>>>>>>>>>>>>>> resources. So,
>>>>>>>>>>>>>>> I think it should be fine to respond with 501 to unimplemented 
>>>>>>>>>>>>>>> endpoints.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner <
>>>>>>>>>>>>>>> etudenhoef...@apache.org> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I watched the catalog sync recording today and updated the
>>>>>>>>>>>>>>>> PR <https://github.com/apache/iceberg/pull/9940> to remove
>>>>>>>>>>>>>>>> fine-grained capabilities like *register-table /
>>>>>>>>>>>>>>>> table-metrics*.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The current capabilities (with versioning information) in
>>>>>>>>>>>>>>>> the PR are:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - tables
>>>>>>>>>>>>>>>>    - views
>>>>>>>>>>>>>>>>    - remote-signing
>>>>>>>>>>>>>>>>    - vended-credentials
>>>>>>>>>>>>>>>>    - multi-table-commit
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For servers that only *partially* implement endpoints
>>>>>>>>>>>>>>>> under a capability the spec requires the server to throw a *501
>>>>>>>>>>>>>>>> Not Implemented*. I've also added the 501 error to the
>>>>>>>>>>>>>>>> response of the respective endpoints but worth mentioning that
>>>>>>>>>>>>>>>> *HEAD* / *GET *requests must not return a 501
>>>>>>>>>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> 
>>>>>>>>>>>>>>>> (this
>>>>>>>>>>>>>>>> implies that the server impl would e.g. return a *404* in
>>>>>>>>>>>>>>>> such a case).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>> Eduard
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré <
>>>>>>>>>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Eduard,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It makes sense to return 501 for servers which don't
>>>>>>>>>>>>>>>>> implement all
>>>>>>>>>>>>>>>>> endpoints. It means that the server will at least have to
>>>>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>>>>> empty endpoints if needed (that makes sense to me).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think we should focus on only "identified capabilities".
>>>>>>>>>>>>>>>>> I think
>>>>>>>>>>>>>>>>> that I proposed before that the capabilities can be
>>>>>>>>>>>>>>>>> overridden/provided by server implementation. Else, I'm
>>>>>>>>>>>>>>>>> afraid we
>>>>>>>>>>>>>>>>> won't be flexible enough or always behind the
>>>>>>>>>>>>>>>>> implementation (if an
>>>>>>>>>>>>>>>>> implementation wants to add "my-foo-cap").
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner
>>>>>>>>>>>>>>>>> <etudenhoef...@apache.org> wrote:
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > I have clarified the wording in #9940 around the
>>>>>>>>>>>>>>>>> requirement on having to implement all endpoints under a 
>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>> capability.
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > For servers that only partially implement endpoints
>>>>>>>>>>>>>>>>> under a capability the spec requires the server to throw a 
>>>>>>>>>>>>>>>>> 501 Not
>>>>>>>>>>>>>>>>> Implemented. This was suggested by Jack and it seems 
>>>>>>>>>>>>>>>>> reasonable to do that.
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > Regarding the inclusion of table-spec / view-spec as a
>>>>>>>>>>>>>>>>> capability: I think this might make sense for the next 
>>>>>>>>>>>>>>>>> iteration of the
>>>>>>>>>>>>>>>>> REST spec but as I mentioned earlier I don't see any clear 
>>>>>>>>>>>>>>>>> benefit for the
>>>>>>>>>>>>>>>>> current REST spec as the client wouldn't do anything with 
>>>>>>>>>>>>>>>>> that information.
>>>>>>>>>>>>>>>>> > If there is a clear benefit of having this, then this
>>>>>>>>>>>>>>>>> can still be added later to the current REST spec but I 
>>>>>>>>>>>>>>>>> believe we should
>>>>>>>>>>>>>>>>> rather have a few well-defined and actionable capabilities 
>>>>>>>>>>>>>>>>> rather than too
>>>>>>>>>>>>>>>>> many.
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > Eduard
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu <
>>>>>>>>>>>>>>>>> liurenjie2...@gmail.com> wrote:
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> Spec is an interesting topic we did not discuss.
>>>>>>>>>>>>>>>>> Robert, how do you envision this to be used?
>>>>>>>>>>>>>>>>> >>> In my mind, if a new table format v3 is launched,
>>>>>>>>>>>>>>>>> there are 2 approaches we can go with, taking CreateTable as 
>>>>>>>>>>>>>>>>> an example:
>>>>>>>>>>>>>>>>> >>> (1) increment the related operation version, which
>>>>>>>>>>>>>>>>> means that POST /v2/{prefix}/namespaces/{ns}/tables will be 
>>>>>>>>>>>>>>>>> created and
>>>>>>>>>>>>>>>>> allow creating tables in the v3 version.
>>>>>>>>>>>>>>>>> >>> (2) update the existing table metadata model to
>>>>>>>>>>>>>>>>> support both v2 and v3 fields, and the server enforces the 
>>>>>>>>>>>>>>>>> payload
>>>>>>>>>>>>>>>>> differently based on the TableMetadata.format-version field. 
>>>>>>>>>>>>>>>>> If the server
>>>>>>>>>>>>>>>>> does not support v3, it can return unsupported at that time.
>>>>>>>>>>>>>>>>> >>> Either way we go, the table-spec version does not need
>>>>>>>>>>>>>>>>> to be a capability. (1) seems to be cleaner, but has some 
>>>>>>>>>>>>>>>>> overhead in
>>>>>>>>>>>>>>>>> provisioning a new endpoint compared to (2).
>>>>>>>>>>>>>>>>> >>> Do you see another way to do this leveraging the
>>>>>>>>>>>>>>>>> table-spec version?
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> 2 is cleaner but maybe inconsistent with current
>>>>>>>>>>>>>>>>> behavior, since /v1/tables operation supports both v1 and v3. 
>>>>>>>>>>>>>>>>> We should
>>>>>>>>>>>>>>>>> only go to 2 only when we have incompatible fields/break 
>>>>>>>>>>>>>>>>> changes according
>>>>>>>>>>>>>>>>> to discussion.
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Generally I agree with adding table-spec into
>>>>>>>>>>>>>>>>> capabilities. For example, we can expose this to user in api 
>>>>>>>>>>>>>>>>> so that user
>>>>>>>>>>>>>>>>> could choose a supported table format version without 
>>>>>>>>>>>>>>>>> throwing exception.
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye <
>>>>>>>>>>>>>>>>> yezhao...@gmail.com> wrote:
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> Spec is an interesting topic we did not discuss.
>>>>>>>>>>>>>>>>> Robert, how do you envision this to be used?
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> In my mind, if a new table format v3 is launched,
>>>>>>>>>>>>>>>>> there are 2 approaches we can go with, taking CreateTable as 
>>>>>>>>>>>>>>>>> an example:
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> (1) increment the related operation version, which
>>>>>>>>>>>>>>>>> means that POST /v2/{prefix}/namespaces/{ns}/tables will be 
>>>>>>>>>>>>>>>>> created and
>>>>>>>>>>>>>>>>> allow creating tables in the v3 version.
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> (2) update the existing table metadata model to
>>>>>>>>>>>>>>>>> support both v2 and v3 fields, and the server enforces the 
>>>>>>>>>>>>>>>>> payload
>>>>>>>>>>>>>>>>> differently based on the TableMetadata.format-version field. 
>>>>>>>>>>>>>>>>> If the server
>>>>>>>>>>>>>>>>> does not support v3, it can return unsupported at that time.
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> Either way we go, the table-spec version does not need
>>>>>>>>>>>>>>>>> to be a capability. (1) seems to be cleaner, but has some 
>>>>>>>>>>>>>>>>> overhead in
>>>>>>>>>>>>>>>>> provisioning a new endpoint compared to (2).
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> Do you see another way to do this leveraging the
>>>>>>>>>>>>>>>>> table-spec version?
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> -Jack
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner
>>>>>>>>>>>>>>>>> <eduard.tudenhoef...@databricks.com.invalid>
>>>>>>>>>>>>>>>>> <eduard.tudenhoef...@databricks.com.invalid> wrote:
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>> I couldn't make it to the catalog sync meeting
>>>>>>>>>>>>>>>>> yesterday but I watched the recording today (thanks for 
>>>>>>>>>>>>>>>>> providing that).
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>>> The missing piece is how (new, capabilities-aware)
>>>>>>>>>>>>>>>>> clients handle the case when a service does _not_ return the 
>>>>>>>>>>>>>>>>> capabilities
>>>>>>>>>>>>>>>>> field (absent). My proposal would be that a client should in 
>>>>>>>>>>>>>>>>> this case
>>>>>>>>>>>>>>>>> assume that all _currently_ existing capabilities are 
>>>>>>>>>>>>>>>>> supported.
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> - tables: [1]
>>>>>>>>>>>>>>>>> >>>>> - views: [1]
>>>>>>>>>>>>>>>>> >>>>> - remote-signing: [1]
>>>>>>>>>>>>>>>>> >>>>> - multi-table-commit: [1]
>>>>>>>>>>>>>>>>> >>>>> - register-table: [1]
>>>>>>>>>>>>>>>>> >>>>> - table-metrics: [1]
>>>>>>>>>>>>>>>>> >>>>> - table-spec: [1,2]
>>>>>>>>>>>>>>>>> >>>>> - view-spec: [1,2]
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>> The one thing I would like to add here is that the
>>>>>>>>>>>>>>>>> current PR uses the tables capability (as version 1) as the 
>>>>>>>>>>>>>>>>> default when a
>>>>>>>>>>>>>>>>> server doesn't return capabilities but it might be also ok to 
>>>>>>>>>>>>>>>>> include views
>>>>>>>>>>>>>>>>> (as version 1) because the current client impl has some code 
>>>>>>>>>>>>>>>>> to deal with
>>>>>>>>>>>>>>>>> errors in case endpoints don't exist.
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>> Unless we agree that the currently existing
>>>>>>>>>>>>>>>>> functionality in the REST spec is the default behavior to be 
>>>>>>>>>>>>>>>>> assumed for
>>>>>>>>>>>>>>>>> older server, I'm not sure about including remote-signing /
>>>>>>>>>>>>>>>>> multi-table-commit / register-table / table-metrics as it has 
>>>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>>>> indicated in earlier comments on the PR/ML that not every 
>>>>>>>>>>>>>>>>> REST server
>>>>>>>>>>>>>>>>> supports these.
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>> That being said, we should discuss whether we want
>>>>>>>>>>>>>>>>> the default behavior (when an older server doesn't send back 
>>>>>>>>>>>>>>>>> capabilities)
>>>>>>>>>>>>>>>>> to be
>>>>>>>>>>>>>>>>> >>>> a) tables (version 1) only
>>>>>>>>>>>>>>>>> >>>> b) the currently existing functionality as defined in
>>>>>>>>>>>>>>>>> the REST spec (as version 1)
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>> On another note: Including table-spec / view-spec
>>>>>>>>>>>>>>>>> seems to be more informative in its nature as I don't think a 
>>>>>>>>>>>>>>>>> client would
>>>>>>>>>>>>>>>>> act differently right now when seeing these.
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>> Thanks
>>>>>>>>>>>>>>>>> >>>> Eduard
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Ryan Blue
>>>>>>>>>>>> Databricks
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>> Robert Stupp
>>>>>>>> @snazy
>>>>>>>>
>>>>>>>> --
>>>>>>> Robert Stupp
>>>>>>> @snazy
>>>>>>>
>>>>>>>

Re: [DISCUSS] Describing REST Server capabilities

Reply via email to