Re: [DISCUSS] Describing REST Server capabilities

Jack Ye Tue, 30 Jul 2024 19:57:21 -0700

> are you talking about the endpoint path like "/v1/"

No, I mean in general the catalog spec version, which is marked currently
as 0.0.1:
https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L27


And in my mind it would be you periodically release a version of the
catalog spec, just like you release Iceberg java libraries and pyiceberg.
And catalog providers will choose to upgrade to that version, and decide if
anything has changed for their existing support, add new support for new
features, document new restrictions and limitations. Basically just go
through a typical product lifecycle when integrating with the spec.

-Jack


On Tue, Jul 30, 2024 at 12:24 PM Steven Wu <[email protected]> wrote:

> >  (2) version the entire catalog spec. A released catalog spec version
> will contain a list of configs it supports, and also a set of APIs and all
> features embedded in the APIs. A server will report the specific catalog
> version it adheres to, and then document the nuances.
>
> Jack, just to clarify, are you talking about the endpoint path like
> "/v1/"? Also, does that mean every API/feature addition would require a
> catalog version bump?
>
> On Tue, Jul 30, 2024 at 8:34 AM Jack Ye <[email protected]> wrote:
>
>> Since the catalog sync was canceled this week, I find maybe it is better
>> to reply here for my latest take on this topic.
>>
>> I think we have 2 discussions intertwined here, that I would like to
>> decouple if possible.
>>
>> (1) is it worth having a concept of capabilities to control client
>> behaviors?
>> (2) suppose we introduce capabilities, is it worth having versioned
>> capabilities?
>>
>> Personally speaking I am currently still more inclined to not have
>> capabilities. An alternative here is to keep doing what has been done for
>> metrics API, which is to introduce feature flags like
>> rest-metrics-reporting-enabled. One strong argument I saw for this
>> alternative is that a feature flag can express non-binary options. For
>> capabilities, you are bound to say just whether the server has this
>> capability or not. But what we really want is to control client behavior
>> based on the capability. And for that, there could be multiple options for
>> the client to interact with the server in existence/absence of a feature.
>> For example, for multi-table commit, there could be 2 different behaviors
>> when the server does not support the endpoint, (1) fail the operation
>> early, (2) fallback to use single-table commit for each table.
>>
>> And with this alternative, there is of course no versioned capabilities.
>> But I think the reason we want versioned capabilities is because we want a
>> general versioning story for the catalog spec with forward and backward
>> compatibility guarantees. If that is the goal, why not: (1) acknowledge the
>> feature flag configs as a part of the spec, (2) version the entire catalog
>> spec. A released catalog spec version will contain a list of configs it
>> supports, and also a set of APIs and all features embedded in the APIs. A
>> server will report the specific catalog version it adheres to, and then
>> document the nuances. I feel this would put catalog providers in a more
>> comfortable situation, as they now have a stable catalog spec to adhere to
>> as the basis, that does not just automatically evolve within the same
>> version. They can implement a catalog spec and upgrade at their own pace
>> following a common versioning semantics. They will also report whatever
>> level of support and detailed behaviors they want, without the need to tie
>> specific behaviors to different capabilities.
>>
>> I think we have been spending quite a long time on this topic, but this
>> is so fundamental that I feel we should think through the alternatives.
>> Would it be possible to at least document in the design proposal why the
>> alternatives are not desirable, what are the pros and cons?
>>
>> -Jack
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Jul 16, 2024 at 5:47 AM Eduard Tudenhöfner <
>> [email protected]> wrote:
>>
>>> Hey everyone,
>>>
>>> I've written up
>>> https://docs.google.com/document/d/1F1xh6SJhS-opgWRe1pPvWh01j8VHNHRocfCCFttKNf0/edit
>>>  to
>>> provide an easier way of giving feedback to the proposal.
>>> Please take a look so that we can discuss how we'd like to handle the
>>> default fallback behavior (*tables* vs *everything that's currently in
>>> the spec*) when a newer client talks to an older server.
>>>
>>>
>>> Eduard
>>>
>>> On Mon, Jul 15, 2024 at 4:24 PM Dmitri Bourlatchkov
>>> <[email protected]> wrote:
>>>
>>>> So I would argue to define the current set of APIs and specs as the
>>>>> default if the `capabilities` field is missing.
>>>>
>>>>
>>>> There have been two sides to this in prior discussions. Having *tables*
>>>> as the default vs having what's *currently in the spec* as the
>>>> default. The argument for having *tables* as the default is because we
>>>> can't assume that every REST server out there already supports views.
>>>>
>>>>
>>>> Can we assume that a server that does not declare capabilities does NOT
>>>> implement views? IMHO, that assumption is too strong and will break use
>>>> cases when the client is upgraded, but the server is not.
>>>>
>>>> Before capabilities were introduced, clients used to work in a certain
>>>> way. I think when the client starts interpreting capabilities, but the
>>>> server does not declare the capabilities property at all, the client should
>>>> (by default) work the same way as when it did not expect capabilities to be
>>>> declared.
>>>>
>>>>
>>>> Hence we're opting for the middle ground with *tables* + having a 
>>>> *configurable
>>>> fallback mechanism*. Servers that already support views can configure
>>>> their clients to default to *tables / views*, meaning that no
>>>> additional (manual) configuration from a client's perspective is required
>>>> to get table & view behavior.
>>>>
>>>>
>>>> Forcing a server upgrade when users just want to upgrade the client is
>>>> too much of a burden, I think. Servers and clients are often managed by
>>>> different groups of people.
>>>>
>>>> In the end, IIRC previous posts in this thread correctly, declaring
>>>> server capabilities is an optimization to allow more efficient / less
>>>> error-prone client operation. I do not think it should impose additional
>>>> functional / interoperability requirements on servers.
>>>>
>>>> Cheers,
>>>> Dmitri.
>>>>
>>>> On Mon, Jul 15, 2024 at 10:11 AM Eduard Tudenhöfner <
>>>> [email protected]> wrote:
>>>>
>>>>> Current servers do not send a `capabilities` field at all. You're
>>>>>> suggesting to use a new `rest-default-capabilities` property to let newer
>>>>>> clients assume `1`.  Once the table/view/etc-spec capabilities are 
>>>>>> needed,
>>>>>> those newer clients would assume table-spec v1. That's wrong IMO.
>>>>>
>>>>>
>>>>> That statement I mentioned only applies to the capabilities that are
>>>>> currently in the PR and not to *table-spec / view-spec*.
>>>>>
>>>>>
>>>>> I'm not a fan of a `rest-default-capabilities` property at all,
>>>>>> because every user has to configure it explicitly and correctly
>>>>>>
>>>>>
>>>>> As I mentioned, servers can configure this for *all* of their clients
>>>>> via the *config* endpoint, so clients wouldn't have to do this
>>>>> *manually*.
>>>>>
>>>>>
>>>>> So I would argue to define the current set of APIs and specs as the
>>>>>> default if the `capabilities` field is missing.
>>>>>
>>>>>
>>>>> There have been two sides to this in prior discussions. Having
>>>>> *tables* as the default vs having what's *currently in the spec* as
>>>>> the default. The argument for having *tables* as the default is
>>>>> because we can't assume that every REST server out there already supports
>>>>> views.
>>>>>
>>>>> Hence we're opting for the middle ground with *tables* + having a 
>>>>> *configurable
>>>>> fallback mechanism*. Servers that already support views can configure
>>>>> their clients to default to *tables / views*, meaning that no
>>>>> additional (manual) configuration from a client's perspective is required
>>>>> to get table & view behavior.
>>>>>
>>>>> Eduard
>>>>>
>>>>> On Mon, Jul 15, 2024 at 3:00 PM Robert Stupp <[email protected]> wrote:
>>>>>
>>>>>> Sorry, I don't understand the two suggestions, especially when used
>>>>>> in combination. Current servers do not send a `capabilities` field at 
>>>>>> all.
>>>>>> You're suggesting to use a new `rest-default-capabilities` property to 
>>>>>> let
>>>>>> newer clients assume `1`.  Once the table/view/etc-spec capabilities are
>>>>>> needed, those newer clients would assume table-spec v1. That's wrong IMO.
>>>>>>
>>>>>> I'm not a fan of a `rest-default-capabilities` property at all,
>>>>>> because every user has to configure it explicitly and correctly. I 
>>>>>> predict
>>>>>> quite some users not doing this or not doing it correctly, causing some
>>>>>> trouble that can be prevented. The way things are configured is already
>>>>>> quite complex, and yet adding another option adds more complexity to
>>>>>> Iceberg. So I would argue to define the current set of APIs and specs as
>>>>>> the default if the `capabilities` field is missing.
>>>>>>
>>>>>> Just because the *current* implementation doesn't use
>>>>>> table-spec/view-spec doesn't mean near future clients would need it -
>>>>>> table-spec v3 isn't that far away. And with new data types, view-spec v2
>>>>>> isn't far away either.
>>>>>>
>>>>>> Adding table-spec + view-spec capabilities now saves a lot of
>>>>>> headaches for Iceberg users in the near future.
>>>>>>
>>>>>>
>>>>>> On 15.07.24 11:27, Eduard Tudenhöfner wrote:
>>>>>>
>>>>>> I would suggest adding *table-spec / view-spec / udf-spec *capabilities
>>>>>> later when new requirements/updates get added. The current implementation
>>>>>> wouldn't make any use of these capabilities, so I don't see a good enough
>>>>>> reason to add them at this point.
>>>>>>
>>>>>> The PR currently says: "tables -> default capability in case the
>>>>>>> `capabilities` property doesn't exist or is empty in the response" -
>>>>>>> meaning: the server would _only_ support tables. This phrase in the spec
>>>>>>> proposal effectively removes the view functionality from all currently
>>>>>>> existing Iceberg REST implementations.
>>>>>>
>>>>>>
>>>>>> This is why the configurable fallback mechanism was mentioned in the
>>>>>> Catalog sync, which can be realized with *r*
>>>>>> *est-default-capabilities=tables,views,abc,xyz* (all of them
>>>>>> defaulting to version 1). A server could send that property via the 
>>>>>> config
>>>>>> route without having clients to change anything.
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 15, 2024 at 10:24 AM Robert Stupp <[email protected]> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I still have concerns regarding the missing table-spec/view-spec
>>>>>>> capabilities. Newer clients can send create/update requests with
>>>>>>> requirements/updates of newer Iceberg table/view/udf specs to a server 
>>>>>>> that
>>>>>>> doesn't support those spec versions - the outcome is rather undefined. 
>>>>>>> What
>>>>>>> should a server do? Ignore the unknown fields and requirement/update 
>>>>>>> types
>>>>>>> and hence do what it's potentially _not_ supposed to do? Reply with a 
>>>>>>> then
>>>>>>> ambiguous 501 (is it the endpoint that's not implemented or the request
>>>>>>> content not supported)? Similar, what if a server decides to not support
>>>>>>> for example table-spec v1 and just drop the manifest-file list in a 
>>>>>>> table
>>>>>>> snapshot leading to data loss?
>>>>>>>
>>>>>>> IMO capabilities must contain the table/view/... spec versions
>>>>>>> supported by the server.
>>>>>>>
>>>>>>> There's also the concern about the behavior if the `capabilties`
>>>>>>> field is missing (see
>>>>>>> https://github.com/apache/iceberg/pull/9940/files#r1676113409, not
>>>>>>> sure why the comment thread's resolved). The PR currently says: "tables 
>>>>>>> ->
>>>>>>> default capability in case the `capabilities` property doesn't exist or 
>>>>>>> is
>>>>>>> empty in the response" - meaning: the server would _only_ support 
>>>>>>> tables.
>>>>>>> This phrase in the spec proposal effectively removes the view 
>>>>>>> functionality
>>>>>>> from all currently existing Iceberg REST implementations.
>>>>>>>
>>>>>>>
>>>>>>> On 11.07.24 08:42, Eduard Tudenhöfner wrote:
>>>>>>>
>>>>>>> Are there any other concerns with the proposal or should we start a
>>>>>>> VOTE thread?
>>>>>>>
>>>>>>> Eduard
>>>>>>>
>>>>>>> On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
>>>>>>> <[email protected]>
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>>> Re: remote signing, I agree that it does not look like a server
>>>>>>>>> capability that a client can / should discover. It is more like 
>>>>>>>>> something
>>>>>>>>> that the server instructs / configures the client to do.
>>>>>>>>
>>>>>>>>
>>>>>>>> While a server can control this behavior and instruct the client to
>>>>>>>> use remote signing, technically nothing is preventing a client from
>>>>>>>> configuring s3.remote-signing-enabled=true. In such a case it
>>>>>>>> seems more appropriate to indicate that this capability isn't supported
>>>>>>>> rather than a generic 501, because not every server will support remote
>>>>>>>> signing.
>>>>>>>>
>>>>>>>>
>>>>>>>> Good point regarding clients taking initiative and using request
>>>>>>>> singing without an explicit server-provided config. It moves the client
>>>>>>>> operations into a mode where the server has more control (over having
>>>>>>>> longer term client-side credentials), so it looks like a reasonable 
>>>>>>>> mode to
>>>>>>>> support from the security perspective.
>>>>>>>>
>>>>>>>> Let's keep that capability flag.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Dmitri.
>>>>>>>>
>>>>>>>> On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hey everyone,
>>>>>>>>>
>>>>>>>>> I've added a few inline comments below.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Re: remote signing, I agree that it does not look like a server
>>>>>>>>>> capability that a client can / should discover. It is more like 
>>>>>>>>>> something
>>>>>>>>>> that the server instructs / configures the client to do.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> While a server can control this behavior and instruct the client
>>>>>>>>> to use remote signing, technically nothing is preventing a client from
>>>>>>>>> configuring s3.remote-signing-enabled=true. In such a case it
>>>>>>>>> seems more appropriate to indicate that this capability isn't 
>>>>>>>>> supported
>>>>>>>>> rather than a generic 501, because not every server will support 
>>>>>>>>> remote
>>>>>>>>> signing.
>>>>>>>>>
>>>>>>>>> The *vended-credentials* capability on the other hand is more
>>>>>>>>> informative in its nature and a server indeed configures a client. I 
>>>>>>>>> think
>>>>>>>>> that was also one of the reasons I removed this capability but added 
>>>>>>>>> it
>>>>>>>>> later back due to a comment from Jack.
>>>>>>>>>
>>>>>>>>> I'm ok either way in terms of removing / keeping
>>>>>>>>> *vended-credentials* as a capability but given that we'd want to
>>>>>>>>> include *actionable* capabilities at this point, I'd just remove
>>>>>>>>> it (nothing is preventing us from adding it later if necessary).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In that case, why do we need all these other capabilities like
>>>>>>>>>> tables, remote-signing, etc. in the first place?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Given that capabilities also carry versioning information, clients
>>>>>>>>> can make more informed decisions on which endpoints to call. One could
>>>>>>>>> argue that generally throwing a 501 on everything that isn't supported
>>>>>>>>> might be sufficient, but that doesn't necessarily help a client in 
>>>>>>>>> knowing
>>>>>>>>> which versions of a capability are safe to call/use.
>>>>>>>>>
>>>>>>>>> Regarding the control of client-side fallback behavior:
>>>>>>>>> I think the default fallback behavior should be *tables* (with
>>>>>>>>> version 1) with a property in the REST catalog that allows 
>>>>>>>>> configuring this
>>>>>>>>> to e.g. *rest-default-capabilities=tables,views,abc,xyz* (all of
>>>>>>>>> them defaulting to version 1).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Eduard
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jul 9, 2024 at 7:00 PM Jack Ye <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Yes I agree that sounds like a valid use case. So the criteria so
>>>>>>>>>> far is that capabilities are used for:
>>>>>>>>>> - controlling client-side fallback behavior
>>>>>>>>>> - failing expensive operations early if we know it will
>>>>>>>>>> eventually fail due to missing capability
>>>>>>>>>>
>>>>>>>>>> Do we agree if this is the criteria we should use? What about the
>>>>>>>>>> other capabilities, namly tables, remote-signing, credential-vending?
>>>>>>>>>>
>>>>>>>>>> -Jack
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue
>>>>>>>>>> <[email protected]> <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> > does it make a difference if I declare the capability or not?
>>>>>>>>>>>
>>>>>>>>>>> I think that it does in other cases. Multi-table commits, for
>>>>>>>>>>> example, are a building block for multi-statement transactions. If a
>>>>>>>>>>> service doesn't support multi-table commits then we ideally want 
>>>>>>>>>>> clients to
>>>>>>>>>>> know that ahead of time so that they don't run a big transaction 
>>>>>>>>>>> and then
>>>>>>>>>>> fail because the commit is not supported.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov
>>>>>>>>>>> <[email protected]>
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Re: remote signing, I agree that it does not look like a server
>>>>>>>>>>>> capability that a client can / should discover. It is more like 
>>>>>>>>>>>> something
>>>>>>>>>>>> that the server instructs / configures the client to do.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I was reconciling the discussion yesterday, one point that was
>>>>>>>>>>>>> interesting to me was that we agreed the purpose of these 
>>>>>>>>>>>>> capabilities is
>>>>>>>>>>>>> to "control client-side fallback behavior", or at least the 
>>>>>>>>>>>>> client should
>>>>>>>>>>>>> behave differently based on these capabilities. However, this 
>>>>>>>>>>>>> seems to be
>>>>>>>>>>>>> only needed so far for views, or more specifically, for loadView 
>>>>>>>>>>>>> API only
>>>>>>>>>>>>> because it impacts the fallback behavior to resolve the 
>>>>>>>>>>>>> identifier as a
>>>>>>>>>>>>> table or not.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For all the other capabilities listed, and even the other
>>>>>>>>>>>>> endpoints in view, because a server can decide to implement it 
>>>>>>>>>>>>> partially
>>>>>>>>>>>>> anyway and just document the behavior, does it make a difference 
>>>>>>>>>>>>> if I
>>>>>>>>>>>>> declare the capability or not? The client will not stop the 
>>>>>>>>>>>>> request, the
>>>>>>>>>>>>> server will just error out if it is not supported. Maybe the 
>>>>>>>>>>>>> error is not
>>>>>>>>>>>>> in the expected code or message, but it is still an error. In 
>>>>>>>>>>>>> that case,
>>>>>>>>>>>>> why do we need all these other capabilities like tables, 
>>>>>>>>>>>>> remote-signing,
>>>>>>>>>>>>> etc. in the first place?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Maybe it is too extreme of a thought, but could anyone help
>>>>>>>>>>>>> describe how the other capabilities could be used beyond 
>>>>>>>>>>>>> potentially
>>>>>>>>>>>>> returning an error earlier?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Jack
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov
>>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Eduard,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> > I've also added the 501 error to the response of the
>>>>>>>>>>>>>> respective endpoints but worth mentioning that *HEAD* / *GET
>>>>>>>>>>>>>> *requests must not return a 501
>>>>>>>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> 
>>>>>>>>>>>>>> (this
>>>>>>>>>>>>>> implies that the server impl would e.g. return a *404* in
>>>>>>>>>>>>>> such a case).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My reading on the Mozilla page makes me think that it is
>>>>>>>>>>>>>> phrased too narrowly. Reading RFC 2616 [1] I believe that it 
>>>>>>>>>>>>>> does not
>>>>>>>>>>>>>> preclude responding with 501 to GET and HEAD requests. I think 
>>>>>>>>>>>>>> it means
>>>>>>>>>>>>>> that GET and HEAD methods must be supported by "general purpose" 
>>>>>>>>>>>>>> servers.
>>>>>>>>>>>>>> The Iceberg REST server is not a general purpose server for 
>>>>>>>>>>>>>> resources. So,
>>>>>>>>>>>>>> I think it should be fine to respond with 501 to unimplemented 
>>>>>>>>>>>>>> endpoints.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I watched the catalog sync recording today and updated the
>>>>>>>>>>>>>>> PR <https://github.com/apache/iceberg/pull/9940> to remove
>>>>>>>>>>>>>>> fine-grained capabilities like *register-table /
>>>>>>>>>>>>>>> table-metrics*.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The current capabilities (with versioning information) in
>>>>>>>>>>>>>>> the PR are:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - tables
>>>>>>>>>>>>>>>    - views
>>>>>>>>>>>>>>>    - remote-signing
>>>>>>>>>>>>>>>    - vended-credentials
>>>>>>>>>>>>>>>    - multi-table-commit
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For servers that only *partially* implement endpoints under
>>>>>>>>>>>>>>> a capability the spec requires the server to throw a *501
>>>>>>>>>>>>>>> Not Implemented*. I've also added the 501 error to the
>>>>>>>>>>>>>>> response of the respective endpoints but worth mentioning that
>>>>>>>>>>>>>>> *HEAD* / *GET *requests must not return a 501
>>>>>>>>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> 
>>>>>>>>>>>>>>> (this
>>>>>>>>>>>>>>> implies that the server impl would e.g. return a *404* in
>>>>>>>>>>>>>>> such a case).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>> Eduard
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Eduard,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It makes sense to return 501 for servers which don't
>>>>>>>>>>>>>>>> implement all
>>>>>>>>>>>>>>>> endpoints. It means that the server will at least have to
>>>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>>>> empty endpoints if needed (that makes sense to me).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think we should focus on only "identified capabilities".
>>>>>>>>>>>>>>>> I think
>>>>>>>>>>>>>>>> that I proposed before that the capabilities can be
>>>>>>>>>>>>>>>> overridden/provided by server implementation. Else, I'm
>>>>>>>>>>>>>>>> afraid we
>>>>>>>>>>>>>>>> won't be flexible enough or always behind the
>>>>>>>>>>>>>>>> implementation (if an
>>>>>>>>>>>>>>>> implementation wants to add "my-foo-cap").
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner
>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > I have clarified the wording in #9940 around the
>>>>>>>>>>>>>>>> requirement on having to implement all endpoints under a 
>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>> capability.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > For servers that only partially implement endpoints under
>>>>>>>>>>>>>>>> a capability the spec requires the server to throw a 501 Not 
>>>>>>>>>>>>>>>> Implemented.
>>>>>>>>>>>>>>>> This was suggested by Jack and it seems reasonable to do that.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Regarding the inclusion of table-spec / view-spec as a
>>>>>>>>>>>>>>>> capability: I think this might make sense for the next 
>>>>>>>>>>>>>>>> iteration of the
>>>>>>>>>>>>>>>> REST spec but as I mentioned earlier I don't see any clear 
>>>>>>>>>>>>>>>> benefit for the
>>>>>>>>>>>>>>>> current REST spec as the client wouldn't do anything with that 
>>>>>>>>>>>>>>>> information.
>>>>>>>>>>>>>>>> > If there is a clear benefit of having this, then this can
>>>>>>>>>>>>>>>> still be added later to the current REST spec but I believe we 
>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>> rather have a few well-defined and actionable capabilities 
>>>>>>>>>>>>>>>> rather than too
>>>>>>>>>>>>>>>> many.
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > Eduard
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> Spec is an interesting topic we did not discuss.
>>>>>>>>>>>>>>>> Robert, how do you envision this to be used?
>>>>>>>>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there
>>>>>>>>>>>>>>>> are 2 approaches we can go with, taking CreateTable as an 
>>>>>>>>>>>>>>>> example:
>>>>>>>>>>>>>>>> >>> (1) increment the related operation version, which
>>>>>>>>>>>>>>>> means that POST /v2/{prefix}/namespaces/{ns}/tables will be 
>>>>>>>>>>>>>>>> created and
>>>>>>>>>>>>>>>> allow creating tables in the v3 version.
>>>>>>>>>>>>>>>> >>> (2) update the existing table metadata model to support
>>>>>>>>>>>>>>>> both v2 and v3 fields, and the server enforces the payload 
>>>>>>>>>>>>>>>> differently
>>>>>>>>>>>>>>>> based on the TableMetadata.format-version field. If the server 
>>>>>>>>>>>>>>>> does not
>>>>>>>>>>>>>>>> support v3, it can return unsupported at that time.
>>>>>>>>>>>>>>>> >>> Either way we go, the table-spec version does not need
>>>>>>>>>>>>>>>> to be a capability. (1) seems to be cleaner, but has some 
>>>>>>>>>>>>>>>> overhead in
>>>>>>>>>>>>>>>> provisioning a new endpoint compared to (2).
>>>>>>>>>>>>>>>> >>> Do you see another way to do this leveraging the
>>>>>>>>>>>>>>>> table-spec version?
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> 2 is cleaner but maybe inconsistent with current
>>>>>>>>>>>>>>>> behavior, since /v1/tables operation supports both v1 and v3. 
>>>>>>>>>>>>>>>> We should
>>>>>>>>>>>>>>>> only go to 2 only when we have incompatible fields/break 
>>>>>>>>>>>>>>>> changes according
>>>>>>>>>>>>>>>> to discussion.
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> Generally I agree with adding table-spec into
>>>>>>>>>>>>>>>> capabilities. For example, we can expose this to user in api 
>>>>>>>>>>>>>>>> so that user
>>>>>>>>>>>>>>>> could choose a supported table format version without throwing 
>>>>>>>>>>>>>>>> exception.
>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> Spec is an interesting topic we did not discuss.
>>>>>>>>>>>>>>>> Robert, how do you envision this to be used?
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there
>>>>>>>>>>>>>>>> are 2 approaches we can go with, taking CreateTable as an 
>>>>>>>>>>>>>>>> example:
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> (1) increment the related operation version, which
>>>>>>>>>>>>>>>> means that POST /v2/{prefix}/namespaces/{ns}/tables will be 
>>>>>>>>>>>>>>>> created and
>>>>>>>>>>>>>>>> allow creating tables in the v3 version.
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> (2) update the existing table metadata model to support
>>>>>>>>>>>>>>>> both v2 and v3 fields, and the server enforces the payload 
>>>>>>>>>>>>>>>> differently
>>>>>>>>>>>>>>>> based on the TableMetadata.format-version field. If the server 
>>>>>>>>>>>>>>>> does not
>>>>>>>>>>>>>>>> support v3, it can return unsupported at that time.
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> Either way we go, the table-spec version does not need
>>>>>>>>>>>>>>>> to be a capability. (1) seems to be cleaner, but has some 
>>>>>>>>>>>>>>>> overhead in
>>>>>>>>>>>>>>>> provisioning a new endpoint compared to (2).
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> Do you see another way to do this leveraging the
>>>>>>>>>>>>>>>> table-spec version?
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> -Jack
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>> >>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner
>>>>>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> I couldn't make it to the catalog sync meeting
>>>>>>>>>>>>>>>> yesterday but I watched the recording today (thanks for 
>>>>>>>>>>>>>>>> providing that).
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>>> The missing piece is how (new, capabilities-aware)
>>>>>>>>>>>>>>>> clients handle the case when a service does _not_ return the 
>>>>>>>>>>>>>>>> capabilities
>>>>>>>>>>>>>>>> field (absent). My proposal would be that a client should in 
>>>>>>>>>>>>>>>> this case
>>>>>>>>>>>>>>>> assume that all _currently_ existing capabilities are 
>>>>>>>>>>>>>>>> supported.
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>> - tables: [1]
>>>>>>>>>>>>>>>> >>>>> - views: [1]
>>>>>>>>>>>>>>>> >>>>> - remote-signing: [1]
>>>>>>>>>>>>>>>> >>>>> - multi-table-commit: [1]
>>>>>>>>>>>>>>>> >>>>> - register-table: [1]
>>>>>>>>>>>>>>>> >>>>> - table-metrics: [1]
>>>>>>>>>>>>>>>> >>>>> - table-spec: [1,2]
>>>>>>>>>>>>>>>> >>>>> - view-spec: [1,2]
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>> >>>> The one thing I would like to add here is that the
>>>>>>>>>>>>>>>> current PR uses the tables capability (as version 1) as the 
>>>>>>>>>>>>>>>> default when a
>>>>>>>>>>>>>>>> server doesn't return capabilities but it might be also ok to 
>>>>>>>>>>>>>>>> include views
>>>>>>>>>>>>>>>> (as version 1) because the current client impl has some code 
>>>>>>>>>>>>>>>> to deal with
>>>>>>>>>>>>>>>> errors in case endpoints don't exist.
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> Unless we agree that the currently existing
>>>>>>>>>>>>>>>> functionality in the REST spec is the default behavior to be 
>>>>>>>>>>>>>>>> assumed for
>>>>>>>>>>>>>>>> older server, I'm not sure about including remote-signing /
>>>>>>>>>>>>>>>> multi-table-commit / register-table / table-metrics as it has 
>>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>>> indicated in earlier comments on the PR/ML that not every REST 
>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>> supports these.
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> That being said, we should discuss whether we want the
>>>>>>>>>>>>>>>> default behavior (when an older server doesn't send back 
>>>>>>>>>>>>>>>> capabilities) to be
>>>>>>>>>>>>>>>> >>>> a) tables (version 1) only
>>>>>>>>>>>>>>>> >>>> b) the currently existing functionality as defined in
>>>>>>>>>>>>>>>> the REST spec (as version 1)
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> On another note: Including table-spec / view-spec
>>>>>>>>>>>>>>>> seems to be more informative in its nature as I don't think a 
>>>>>>>>>>>>>>>> client would
>>>>>>>>>>>>>>>> act differently right now when seeing these.
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>> >>>> Thanks
>>>>>>>>>>>>>>>> >>>> Eduard
>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Ryan Blue
>>>>>>>>>>> Databricks
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>> Robert Stupp
>>>>>>> @snazy
>>>>>>>
>>>>>>> --
>>>>>> Robert Stupp
>>>>>> @snazy
>>>>>>
>>>>>>

Re: [DISCUSS] Describing REST Server capabilities

Reply via email to