Re: [DISCUSS] Describing REST Server capabilities

Jack Ye Tue, 30 Jul 2024 08:34:45 -0700

Since the catalog sync was canceled this week, I find maybe it is better to
reply here for my latest take on this topic.


I think we have 2 discussions intertwined here, that I would like to
decouple if possible.

(1) is it worth having a concept of capabilities to control client
behaviors?
(2) suppose we introduce capabilities, is it worth having versioned
capabilities?

Personally speaking I am currently still more inclined to not have
capabilities. An alternative here is to keep doing what has been done for
metrics API, which is to introduce feature flags like
rest-metrics-reporting-enabled. One strong argument I saw for this
alternative is that a feature flag can express non-binary options. For
capabilities, you are bound to say just whether the server has this
capability or not. But what we really want is to control client behavior
based on the capability. And for that, there could be multiple options for
the client to interact with the server in existence/absence of a feature.
For example, for multi-table commit, there could be 2 different behaviors
when the server does not support the endpoint, (1) fail the operation
early, (2) fallback to use single-table commit for each table.

And with this alternative, there is of course no versioned capabilities.
But I think the reason we want versioned capabilities is because we want a
general versioning story for the catalog spec with forward and backward
compatibility guarantees. If that is the goal, why not: (1) acknowledge the
feature flag configs as a part of the spec, (2) version the entire catalog
spec. A released catalog spec version will contain a list of configs it
supports, and also a set of APIs and all features embedded in the APIs. A
server will report the specific catalog version it adheres to, and then
document the nuances. I feel this would put catalog providers in a more
comfortable situation, as they now have a stable catalog spec to adhere to
as the basis, that does not just automatically evolve within the same
version. They can implement a catalog spec and upgrade at their own pace
following a common versioning semantics. They will also report whatever
level of support and detailed behaviors they want, without the need to tie
specific behaviors to different capabilities.

I think we have been spending quite a long time on this topic, but this is
so fundamental that I feel we should think through the alternatives. Would
it be possible to at least document in the design proposal why the
alternatives are not desirable, what are the pros and cons?

-Jack










On Tue, Jul 16, 2024 at 5:47 AM Eduard Tudenhöfner <etudenhoef...@apache.org>
wrote:

> Hey everyone,
>
> I've written up
> https://docs.google.com/document/d/1F1xh6SJhS-opgWRe1pPvWh01j8VHNHRocfCCFttKNf0/edit
>  to
> provide an easier way of giving feedback to the proposal.
> Please take a look so that we can discuss how we'd like to handle the
> default fallback behavior (*tables* vs *everything that's currently in
> the spec*) when a newer client talks to an older server.
>
>
> Eduard
>
> On Mon, Jul 15, 2024 at 4:24 PM Dmitri Bourlatchkov
> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>
>> So I would argue to define the current set of APIs and specs as the
>>> default if the `capabilities` field is missing.
>>
>>
>> There have been two sides to this in prior discussions. Having *tables*
>> as the default vs having what's *currently in the spec* as the default.
>> The argument for having *tables* as the default is because we can't
>> assume that every REST server out there already supports views.
>>
>>
>> Can we assume that a server that does not declare capabilities does NOT
>> implement views? IMHO, that assumption is too strong and will break use
>> cases when the client is upgraded, but the server is not.
>>
>> Before capabilities were introduced, clients used to work in a certain
>> way. I think when the client starts interpreting capabilities, but the
>> server does not declare the capabilities property at all, the client should
>> (by default) work the same way as when it did not expect capabilities to be
>> declared.
>>
>>
>> Hence we're opting for the middle ground with *tables* + having a 
>> *configurable
>> fallback mechanism*. Servers that already support views can configure
>> their clients to default to *tables / views*, meaning that no additional
>> (manual) configuration from a client's perspective is required to get table
>> & view behavior.
>>
>>
>> Forcing a server upgrade when users just want to upgrade the client is
>> too much of a burden, I think. Servers and clients are often managed by
>> different groups of people.
>>
>> In the end, IIRC previous posts in this thread correctly, declaring
>> server capabilities is an optimization to allow more efficient / less
>> error-prone client operation. I do not think it should impose additional
>> functional / interoperability requirements on servers.
>>
>> Cheers,
>> Dmitri.
>>
>> On Mon, Jul 15, 2024 at 10:11 AM Eduard Tudenhöfner <
>> etudenhoef...@apache.org> wrote:
>>
>>> Current servers do not send a `capabilities` field at all. You're
>>>> suggesting to use a new `rest-default-capabilities` property to let newer
>>>> clients assume `1`.  Once the table/view/etc-spec capabilities are needed,
>>>> those newer clients would assume table-spec v1. That's wrong IMO.
>>>
>>>
>>> That statement I mentioned only applies to the capabilities that are
>>> currently in the PR and not to *table-spec / view-spec*.
>>>
>>>
>>> I'm not a fan of a `rest-default-capabilities` property at all, because
>>>> every user has to configure it explicitly and correctly
>>>>
>>>
>>> As I mentioned, servers can configure this for *all* of their clients
>>> via the *config* endpoint, so clients wouldn't have to do this
>>> *manually*.
>>>
>>>
>>> So I would argue to define the current set of APIs and specs as the
>>>> default if the `capabilities` field is missing.
>>>
>>>
>>> There have been two sides to this in prior discussions. Having *tables*
>>> as the default vs having what's *currently in the spec* as the default.
>>> The argument for having *tables* as the default is because we can't
>>> assume that every REST server out there already supports views.
>>>
>>> Hence we're opting for the middle ground with *tables* + having a 
>>> *configurable
>>> fallback mechanism*. Servers that already support views can configure
>>> their clients to default to *tables / views*, meaning that no
>>> additional (manual) configuration from a client's perspective is required
>>> to get table & view behavior.
>>>
>>> Eduard
>>>
>>> On Mon, Jul 15, 2024 at 3:00 PM Robert Stupp <sn...@snazy.de> wrote:
>>>
>>>> Sorry, I don't understand the two suggestions, especially when used in
>>>> combination. Current servers do not send a `capabilities` field at all.
>>>> You're suggesting to use a new `rest-default-capabilities` property to let
>>>> newer clients assume `1`.  Once the table/view/etc-spec capabilities are
>>>> needed, those newer clients would assume table-spec v1. That's wrong IMO.
>>>>
>>>> I'm not a fan of a `rest-default-capabilities` property at all, because
>>>> every user has to configure it explicitly and correctly. I predict quite
>>>> some users not doing this or not doing it correctly, causing some trouble
>>>> that can be prevented. The way things are configured is already quite
>>>> complex, and yet adding another option adds more complexity to Iceberg. So
>>>> I would argue to define the current set of APIs and specs as the default if
>>>> the `capabilities` field is missing.
>>>>
>>>> Just because the *current* implementation doesn't use
>>>> table-spec/view-spec doesn't mean near future clients would need it -
>>>> table-spec v3 isn't that far away. And with new data types, view-spec v2
>>>> isn't far away either.
>>>>
>>>> Adding table-spec + view-spec capabilities now saves a lot of headaches
>>>> for Iceberg users in the near future.
>>>>
>>>>
>>>> On 15.07.24 11:27, Eduard Tudenhöfner wrote:
>>>>
>>>> I would suggest adding *table-spec / view-spec / udf-spec *capabilities
>>>> later when new requirements/updates get added. The current implementation
>>>> wouldn't make any use of these capabilities, so I don't see a good enough
>>>> reason to add them at this point.
>>>>
>>>> The PR currently says: "tables -> default capability in case the
>>>>> `capabilities` property doesn't exist or is empty in the response" -
>>>>> meaning: the server would _only_ support tables. This phrase in the spec
>>>>> proposal effectively removes the view functionality from all currently
>>>>> existing Iceberg REST implementations.
>>>>
>>>>
>>>> This is why the configurable fallback mechanism was mentioned in the
>>>> Catalog sync, which can be realized with *r*
>>>> *est-default-capabilities=tables,views,abc,xyz* (all of them
>>>> defaulting to version 1). A server could send that property via the config
>>>> route without having clients to change anything.
>>>>
>>>>
>>>> On Mon, Jul 15, 2024 at 10:24 AM Robert Stupp <sn...@snazy.de> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I still have concerns regarding the missing table-spec/view-spec
>>>>> capabilities. Newer clients can send create/update requests with
>>>>> requirements/updates of newer Iceberg table/view/udf specs to a server 
>>>>> that
>>>>> doesn't support those spec versions - the outcome is rather undefined. 
>>>>> What
>>>>> should a server do? Ignore the unknown fields and requirement/update types
>>>>> and hence do what it's potentially _not_ supposed to do? Reply with a then
>>>>> ambiguous 501 (is it the endpoint that's not implemented or the request
>>>>> content not supported)? Similar, what if a server decides to not support
>>>>> for example table-spec v1 and just drop the manifest-file list in a table
>>>>> snapshot leading to data loss?
>>>>>
>>>>> IMO capabilities must contain the table/view/... spec versions
>>>>> supported by the server.
>>>>>
>>>>> There's also the concern about the behavior if the `capabilties` field
>>>>> is missing (see
>>>>> https://github.com/apache/iceberg/pull/9940/files#r1676113409, not
>>>>> sure why the comment thread's resolved). The PR currently says: "tables ->
>>>>> default capability in case the `capabilities` property doesn't exist or is
>>>>> empty in the response" - meaning: the server would _only_ support tables.
>>>>> This phrase in the spec proposal effectively removes the view 
>>>>> functionality
>>>>> from all currently existing Iceberg REST implementations.
>>>>>
>>>>>
>>>>> On 11.07.24 08:42, Eduard Tudenhöfner wrote:
>>>>>
>>>>> Are there any other concerns with the proposal or should we start a
>>>>> VOTE thread?
>>>>>
>>>>> Eduard
>>>>>
>>>>> On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
>>>>> <dmitri.bourlatch...@dremio.com.invalid>
>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>
>>>>>> Re: remote signing, I agree that it does not look like a server
>>>>>>> capability that a client can / should discover. It is more like 
>>>>>>> something
>>>>>>> that the server instructs / configures the client to do.
>>>>>>
>>>>>>
>>>>>> While a server can control this behavior and instruct the client to
>>>>>> use remote signing, technically nothing is preventing a client from
>>>>>> configuring s3.remote-signing-enabled=true. In such a case it seems
>>>>>> more appropriate to indicate that this capability isn't supported rather
>>>>>> than a generic 501, because not every server will support remote signing.
>>>>>>
>>>>>>
>>>>>> Good point regarding clients taking initiative and using request
>>>>>> singing without an explicit server-provided config. It moves the client
>>>>>> operations into a mode where the server has more control (over having
>>>>>> longer term client-side credentials), so it looks like a reasonable mode 
>>>>>> to
>>>>>> support from the security perspective.
>>>>>>
>>>>>> Let's keep that capability flag.
>>>>>>
>>>>>> Cheers,
>>>>>> Dmitri.
>>>>>>
>>>>>> On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner <
>>>>>> etudenhoef...@apache.org> wrote:
>>>>>>
>>>>>>> Hey everyone,
>>>>>>>
>>>>>>> I've added a few inline comments below.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Re: remote signing, I agree that it does not look like a server
>>>>>>>> capability that a client can / should discover. It is more like 
>>>>>>>> something
>>>>>>>> that the server instructs / configures the client to do.
>>>>>>>
>>>>>>>
>>>>>>> While a server can control this behavior and instruct the client to
>>>>>>> use remote signing, technically nothing is preventing a client from
>>>>>>> configuring s3.remote-signing-enabled=true. In such a case it seems
>>>>>>> more appropriate to indicate that this capability isn't supported rather
>>>>>>> than a generic 501, because not every server will support remote 
>>>>>>> signing.
>>>>>>>
>>>>>>> The *vended-credentials* capability on the other hand is more
>>>>>>> informative in its nature and a server indeed configures a client. I 
>>>>>>> think
>>>>>>> that was also one of the reasons I removed this capability but added it
>>>>>>> later back due to a comment from Jack.
>>>>>>>
>>>>>>> I'm ok either way in terms of removing / keeping
>>>>>>> *vended-credentials* as a capability but given that we'd want to
>>>>>>> include *actionable* capabilities at this point, I'd just remove it
>>>>>>> (nothing is preventing us from adding it later if necessary).
>>>>>>>
>>>>>>>
>>>>>>> In that case, why do we need all these other capabilities like
>>>>>>>> tables, remote-signing, etc. in the first place?
>>>>>>>
>>>>>>>
>>>>>>> Given that capabilities also carry versioning information, clients
>>>>>>> can make more informed decisions on which endpoints to call. One could
>>>>>>> argue that generally throwing a 501 on everything that isn't supported
>>>>>>> might be sufficient, but that doesn't necessarily help a client in 
>>>>>>> knowing
>>>>>>> which versions of a capability are safe to call/use.
>>>>>>>
>>>>>>> Regarding the control of client-side fallback behavior:
>>>>>>> I think the default fallback behavior should be *tables* (with
>>>>>>> version 1) with a property in the REST catalog that allows configuring 
>>>>>>> this
>>>>>>> to e.g. *rest-default-capabilities=tables,views,abc,xyz* (all of
>>>>>>> them defaulting to version 1).
>>>>>>>
>>>>>>>
>>>>>>> Eduard
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 9, 2024 at 7:00 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Yes I agree that sounds like a valid use case. So the criteria so
>>>>>>>> far is that capabilities are used for:
>>>>>>>> - controlling client-side fallback behavior
>>>>>>>> - failing expensive operations early if we know it will eventually
>>>>>>>> fail due to missing capability
>>>>>>>>
>>>>>>>> Do we agree if this is the criteria we should use? What about the
>>>>>>>> other capabilities, namly tables, remote-signing, credential-vending?
>>>>>>>>
>>>>>>>> -Jack
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue
>>>>>>>> <b...@databricks.com.invalid> <b...@databricks.com.invalid> wrote:
>>>>>>>>
>>>>>>>>> > does it make a difference if I declare the capability or not?
>>>>>>>>>
>>>>>>>>> I think that it does in other cases. Multi-table commits, for
>>>>>>>>> example, are a building block for multi-statement transactions. If a
>>>>>>>>> service doesn't support multi-table commits then we ideally want 
>>>>>>>>> clients to
>>>>>>>>> know that ahead of time so that they don't run a big transaction and 
>>>>>>>>> then
>>>>>>>>> fail because the commit is not supported.
>>>>>>>>>
>>>>>>>>> On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov
>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid>
>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> Re: remote signing, I agree that it does not look like a server
>>>>>>>>>> capability that a client can / should discover. It is more like 
>>>>>>>>>> something
>>>>>>>>>> that the server instructs / configures the client to do.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Dmitri.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye <yezhao...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I was reconciling the discussion yesterday, one point that was
>>>>>>>>>>> interesting to me was that we agreed the purpose of these 
>>>>>>>>>>> capabilities is
>>>>>>>>>>> to "control client-side fallback behavior", or at least the client 
>>>>>>>>>>> should
>>>>>>>>>>> behave differently based on these capabilities. However, this seems 
>>>>>>>>>>> to be
>>>>>>>>>>> only needed so far for views, or more specifically, for loadView 
>>>>>>>>>>> API only
>>>>>>>>>>> because it impacts the fallback behavior to resolve the identifier 
>>>>>>>>>>> as a
>>>>>>>>>>> table or not.
>>>>>>>>>>>
>>>>>>>>>>> For all the other capabilities listed, and even the other
>>>>>>>>>>> endpoints in view, because a server can decide to implement it 
>>>>>>>>>>> partially
>>>>>>>>>>> anyway and just document the behavior, does it make a difference if 
>>>>>>>>>>> I
>>>>>>>>>>> declare the capability or not? The client will not stop the 
>>>>>>>>>>> request, the
>>>>>>>>>>> server will just error out if it is not supported. Maybe the error 
>>>>>>>>>>> is not
>>>>>>>>>>> in the expected code or message, but it is still an error. In that 
>>>>>>>>>>> case,
>>>>>>>>>>> why do we need all these other capabilities like tables, 
>>>>>>>>>>> remote-signing,
>>>>>>>>>>> etc. in the first place?
>>>>>>>>>>>
>>>>>>>>>>> Maybe it is too extreme of a thought, but could anyone help
>>>>>>>>>>> describe how the other capabilities could be used beyond potentially
>>>>>>>>>>> returning an error earlier?
>>>>>>>>>>>
>>>>>>>>>>> -Jack
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov
>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid>
>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Eduard,
>>>>>>>>>>>>
>>>>>>>>>>>> > I've also added the 501 error to the response of the
>>>>>>>>>>>> respective endpoints but worth mentioning that *HEAD* / *GET 
>>>>>>>>>>>> *requests
>>>>>>>>>>>> must not return a 501
>>>>>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> 
>>>>>>>>>>>> (this
>>>>>>>>>>>> implies that the server impl would e.g. return a *404* in such
>>>>>>>>>>>> a case).
>>>>>>>>>>>>
>>>>>>>>>>>> My reading on the Mozilla page makes me think that it is
>>>>>>>>>>>> phrased too narrowly. Reading RFC 2616 [1] I believe that it does 
>>>>>>>>>>>> not
>>>>>>>>>>>> preclude responding with 501 to GET and HEAD requests. I think it 
>>>>>>>>>>>> means
>>>>>>>>>>>> that GET and HEAD methods must be supported by "general purpose" 
>>>>>>>>>>>> servers.
>>>>>>>>>>>> The Iceberg REST server is not a general purpose server for 
>>>>>>>>>>>> resources. So,
>>>>>>>>>>>> I think it should be fine to respond with 501 to unimplemented 
>>>>>>>>>>>> endpoints.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>
>>>>>>>>>>>> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner <
>>>>>>>>>>>> etudenhoef...@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I watched the catalog sync recording today and updated the PR
>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/9940> to remove
>>>>>>>>>>>>> fine-grained capabilities like *register-table /
>>>>>>>>>>>>> table-metrics*.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The current capabilities (with versioning information) in the
>>>>>>>>>>>>> PR are:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - tables
>>>>>>>>>>>>>    - views
>>>>>>>>>>>>>    - remote-signing
>>>>>>>>>>>>>    - vended-credentials
>>>>>>>>>>>>>    - multi-table-commit
>>>>>>>>>>>>>
>>>>>>>>>>>>> For servers that only *partially* implement endpoints under a
>>>>>>>>>>>>> capability the spec requires the server to throw a *501 Not
>>>>>>>>>>>>> Implemented*. I've also added the 501 error to the response
>>>>>>>>>>>>> of the respective endpoints but worth mentioning that *HEAD*
>>>>>>>>>>>>> / *GET *requests must not return a 501
>>>>>>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> 
>>>>>>>>>>>>> (this
>>>>>>>>>>>>> implies that the server impl would e.g. return a *404* in
>>>>>>>>>>>>> such a case).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> Eduard
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré <
>>>>>>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Eduard,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It makes sense to return 501 for servers which don't
>>>>>>>>>>>>>> implement all
>>>>>>>>>>>>>> endpoints. It means that the server will at least have to
>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>> empty endpoints if needed (that makes sense to me).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think we should focus on only "identified capabilities". I
>>>>>>>>>>>>>> think
>>>>>>>>>>>>>> that I proposed before that the capabilities can be
>>>>>>>>>>>>>> overridden/provided by server implementation. Else, I'm
>>>>>>>>>>>>>> afraid we
>>>>>>>>>>>>>> won't be flexible enough or always behind the implementation
>>>>>>>>>>>>>> (if an
>>>>>>>>>>>>>> implementation wants to add "my-foo-cap").
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner
>>>>>>>>>>>>>> <etudenhoef...@apache.org> wrote:
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > I have clarified the wording in #9940 around the
>>>>>>>>>>>>>> requirement on having to implement all endpoints under a 
>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>> capability.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > For servers that only partially implement endpoints under a
>>>>>>>>>>>>>> capability the spec requires the server to throw a 501 Not 
>>>>>>>>>>>>>> Implemented.
>>>>>>>>>>>>>> This was suggested by Jack and it seems reasonable to do that.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Regarding the inclusion of table-spec / view-spec as a
>>>>>>>>>>>>>> capability: I think this might make sense for the next iteration 
>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>> REST spec but as I mentioned earlier I don't see any clear 
>>>>>>>>>>>>>> benefit for the
>>>>>>>>>>>>>> current REST spec as the client wouldn't do anything with that 
>>>>>>>>>>>>>> information.
>>>>>>>>>>>>>> > If there is a clear benefit of having this, then this can
>>>>>>>>>>>>>> still be added later to the current REST spec but I believe we 
>>>>>>>>>>>>>> should
>>>>>>>>>>>>>> rather have a few well-defined and actionable capabilities 
>>>>>>>>>>>>>> rather than too
>>>>>>>>>>>>>> many.
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > Eduard
>>>>>>>>>>>>>> >
>>>>>>>>>>>>>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu <
>>>>>>>>>>>>>> liurenjie2...@gmail.com> wrote:
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert,
>>>>>>>>>>>>>> how do you envision this to be used?
>>>>>>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there
>>>>>>>>>>>>>> are 2 approaches we can go with, taking CreateTable as an 
>>>>>>>>>>>>>> example:
>>>>>>>>>>>>>> >>> (1) increment the related operation version, which means
>>>>>>>>>>>>>> that POST /v2/{prefix}/namespaces/{ns}/tables will be created 
>>>>>>>>>>>>>> and allow
>>>>>>>>>>>>>> creating tables in the v3 version.
>>>>>>>>>>>>>> >>> (2) update the existing table metadata model to support
>>>>>>>>>>>>>> both v2 and v3 fields, and the server enforces the payload 
>>>>>>>>>>>>>> differently
>>>>>>>>>>>>>> based on the TableMetadata.format-version field. If the server 
>>>>>>>>>>>>>> does not
>>>>>>>>>>>>>> support v3, it can return unsupported at that time.
>>>>>>>>>>>>>> >>> Either way we go, the table-spec version does not need to
>>>>>>>>>>>>>> be a capability. (1) seems to be cleaner, but has some overhead 
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>> provisioning a new endpoint compared to (2).
>>>>>>>>>>>>>> >>> Do you see another way to do this leveraging the
>>>>>>>>>>>>>> table-spec version?
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> 2 is cleaner but maybe inconsistent with current behavior,
>>>>>>>>>>>>>> since /v1/tables operation supports both v1 and v3. We should 
>>>>>>>>>>>>>> only go to 2
>>>>>>>>>>>>>> only when we have incompatible fields/break changes according to 
>>>>>>>>>>>>>> discussion.
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> Generally I agree with adding table-spec into
>>>>>>>>>>>>>> capabilities. For example, we can expose this to user in api so 
>>>>>>>>>>>>>> that user
>>>>>>>>>>>>>> could choose a supported table format version without throwing 
>>>>>>>>>>>>>> exception.
>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye <
>>>>>>>>>>>>>> yezhao...@gmail.com> wrote:
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert,
>>>>>>>>>>>>>> how do you envision this to be used?
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there
>>>>>>>>>>>>>> are 2 approaches we can go with, taking CreateTable as an 
>>>>>>>>>>>>>> example:
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> (1) increment the related operation version, which means
>>>>>>>>>>>>>> that POST /v2/{prefix}/namespaces/{ns}/tables will be created 
>>>>>>>>>>>>>> and allow
>>>>>>>>>>>>>> creating tables in the v3 version.
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> (2) update the existing table metadata model to support
>>>>>>>>>>>>>> both v2 and v3 fields, and the server enforces the payload 
>>>>>>>>>>>>>> differently
>>>>>>>>>>>>>> based on the TableMetadata.format-version field. If the server 
>>>>>>>>>>>>>> does not
>>>>>>>>>>>>>> support v3, it can return unsupported at that time.
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> Either way we go, the table-spec version does not need to
>>>>>>>>>>>>>> be a capability. (1) seems to be cleaner, but has some overhead 
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>> provisioning a new endpoint compared to (2).
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> Do you see another way to do this leveraging the
>>>>>>>>>>>>>> table-spec version?
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> -Jack
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>> >>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner
>>>>>>>>>>>>>> <eduard.tudenhoef...@databricks.com.invalid>
>>>>>>>>>>>>>> <eduard.tudenhoef...@databricks.com.invalid> wrote:
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> I couldn't make it to the catalog sync meeting yesterday
>>>>>>>>>>>>>> but I watched the recording today (thanks for providing that).
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>>> The missing piece is how (new, capabilities-aware)
>>>>>>>>>>>>>> clients handle the case when a service does _not_ return the 
>>>>>>>>>>>>>> capabilities
>>>>>>>>>>>>>> field (absent). My proposal would be that a client should in 
>>>>>>>>>>>>>> this case
>>>>>>>>>>>>>> assume that all _currently_ existing capabilities are supported.
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>> - tables: [1]
>>>>>>>>>>>>>> >>>>> - views: [1]
>>>>>>>>>>>>>> >>>>> - remote-signing: [1]
>>>>>>>>>>>>>> >>>>> - multi-table-commit: [1]
>>>>>>>>>>>>>> >>>>> - register-table: [1]
>>>>>>>>>>>>>> >>>>> - table-metrics: [1]
>>>>>>>>>>>>>> >>>>> - table-spec: [1,2]
>>>>>>>>>>>>>> >>>>> - view-spec: [1,2]
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>> >>>> The one thing I would like to add here is that the
>>>>>>>>>>>>>> current PR uses the tables capability (as version 1) as the 
>>>>>>>>>>>>>> default when a
>>>>>>>>>>>>>> server doesn't return capabilities but it might be also ok to 
>>>>>>>>>>>>>> include views
>>>>>>>>>>>>>> (as version 1) because the current client impl has some code to 
>>>>>>>>>>>>>> deal with
>>>>>>>>>>>>>> errors in case endpoints don't exist.
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> Unless we agree that the currently existing
>>>>>>>>>>>>>> functionality in the REST spec is the default behavior to be 
>>>>>>>>>>>>>> assumed for
>>>>>>>>>>>>>> older server, I'm not sure about including remote-signing /
>>>>>>>>>>>>>> multi-table-commit / register-table / table-metrics as it has 
>>>>>>>>>>>>>> been
>>>>>>>>>>>>>> indicated in earlier comments on the PR/ML that not every REST 
>>>>>>>>>>>>>> server
>>>>>>>>>>>>>> supports these.
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> That being said, we should discuss whether we want the
>>>>>>>>>>>>>> default behavior (when an older server doesn't send back 
>>>>>>>>>>>>>> capabilities) to be
>>>>>>>>>>>>>> >>>> a) tables (version 1) only
>>>>>>>>>>>>>> >>>> b) the currently existing functionality as defined in
>>>>>>>>>>>>>> the REST spec (as version 1)
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> On another note: Including table-spec / view-spec seems
>>>>>>>>>>>>>> to be more informative in its nature as I don't think a client 
>>>>>>>>>>>>>> would act
>>>>>>>>>>>>>> differently right now when seeing these.
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>> >>>> Thanks
>>>>>>>>>>>>>> >>>> Eduard
>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Ryan Blue
>>>>>>>>> Databricks
>>>>>>>>>
>>>>>>>> --
>>>>> Robert Stupp
>>>>> @snazy
>>>>>
>>>>> --
>>>> Robert Stupp
>>>> @snazy
>>>>
>>>>

Re: [DISCUSS] Describing REST Server capabilities

Reply via email to