Re: [DISCUSS] Describing REST Server capabilities

Steven Wu Tue, 30 Jul 2024 12:24:37 -0700

>  (2) version the entire catalog spec. A released catalog spec version
will contain a list of configs it supports, and also a set of APIs and all
features embedded in the APIs. A server will report the specific catalog
version it adheres to, and then document the nuances.


Jack, just to clarify, are you talking about the endpoint path like "/v1/"?
Also, does that mean every API/feature addition would require a catalog
version bump?

On Tue, Jul 30, 2024 at 8:34 AM Jack Ye <yezhao...@gmail.com> wrote:

> Since the catalog sync was canceled this week, I find maybe it is better
> to reply here for my latest take on this topic.
>
> I think we have 2 discussions intertwined here, that I would like to
> decouple if possible.
>
> (1) is it worth having a concept of capabilities to control client
> behaviors?
> (2) suppose we introduce capabilities, is it worth having versioned
> capabilities?
>
> Personally speaking I am currently still more inclined to not have
> capabilities. An alternative here is to keep doing what has been done for
> metrics API, which is to introduce feature flags like
> rest-metrics-reporting-enabled. One strong argument I saw for this
> alternative is that a feature flag can express non-binary options. For
> capabilities, you are bound to say just whether the server has this
> capability or not. But what we really want is to control client behavior
> based on the capability. And for that, there could be multiple options for
> the client to interact with the server in existence/absence of a feature.
> For example, for multi-table commit, there could be 2 different behaviors
> when the server does not support the endpoint, (1) fail the operation
> early, (2) fallback to use single-table commit for each table.
>
> And with this alternative, there is of course no versioned capabilities.
> But I think the reason we want versioned capabilities is because we want a
> general versioning story for the catalog spec with forward and backward
> compatibility guarantees. If that is the goal, why not: (1) acknowledge the
> feature flag configs as a part of the spec, (2) version the entire catalog
> spec. A released catalog spec version will contain a list of configs it
> supports, and also a set of APIs and all features embedded in the APIs. A
> server will report the specific catalog version it adheres to, and then
> document the nuances. I feel this would put catalog providers in a more
> comfortable situation, as they now have a stable catalog spec to adhere to
> as the basis, that does not just automatically evolve within the same
> version. They can implement a catalog spec and upgrade at their own pace
> following a common versioning semantics. They will also report whatever
> level of support and detailed behaviors they want, without the need to tie
> specific behaviors to different capabilities.
>
> I think we have been spending quite a long time on this topic, but this is
> so fundamental that I feel we should think through the alternatives. Would
> it be possible to at least document in the design proposal why the
> alternatives are not desirable, what are the pros and cons?
>
> -Jack
>
>
>
>
>
>
>
>
>
>
> On Tue, Jul 16, 2024 at 5:47 AM Eduard Tudenhöfner <
> etudenhoef...@apache.org> wrote:
>
>> Hey everyone,
>>
>> I've written up
>> https://docs.google.com/document/d/1F1xh6SJhS-opgWRe1pPvWh01j8VHNHRocfCCFttKNf0/edit
>>  to
>> provide an easier way of giving feedback to the proposal.
>> Please take a look so that we can discuss how we'd like to handle the
>> default fallback behavior (*tables* vs *everything that's currently in
>> the spec*) when a newer client talks to an older server.
>>
>>
>> Eduard
>>
>> On Mon, Jul 15, 2024 at 4:24 PM Dmitri Bourlatchkov
>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>
>>> So I would argue to define the current set of APIs and specs as the
>>>> default if the `capabilities` field is missing.
>>>
>>>
>>> There have been two sides to this in prior discussions. Having *tables*
>>> as the default vs having what's *currently in the spec* as the default.
>>> The argument for having *tables* as the default is because we can't
>>> assume that every REST server out there already supports views.
>>>
>>>
>>> Can we assume that a server that does not declare capabilities does NOT
>>> implement views? IMHO, that assumption is too strong and will break use
>>> cases when the client is upgraded, but the server is not.
>>>
>>> Before capabilities were introduced, clients used to work in a certain
>>> way. I think when the client starts interpreting capabilities, but the
>>> server does not declare the capabilities property at all, the client should
>>> (by default) work the same way as when it did not expect capabilities to be
>>> declared.
>>>
>>>
>>> Hence we're opting for the middle ground with *tables* + having a 
>>> *configurable
>>> fallback mechanism*. Servers that already support views can configure
>>> their clients to default to *tables / views*, meaning that no
>>> additional (manual) configuration from a client's perspective is required
>>> to get table & view behavior.
>>>
>>>
>>> Forcing a server upgrade when users just want to upgrade the client is
>>> too much of a burden, I think. Servers and clients are often managed by
>>> different groups of people.
>>>
>>> In the end, IIRC previous posts in this thread correctly, declaring
>>> server capabilities is an optimization to allow more efficient / less
>>> error-prone client operation. I do not think it should impose additional
>>> functional / interoperability requirements on servers.
>>>
>>> Cheers,
>>> Dmitri.
>>>
>>> On Mon, Jul 15, 2024 at 10:11 AM Eduard Tudenhöfner <
>>> etudenhoef...@apache.org> wrote:
>>>
>>>> Current servers do not send a `capabilities` field at all. You're
>>>>> suggesting to use a new `rest-default-capabilities` property to let newer
>>>>> clients assume `1`.  Once the table/view/etc-spec capabilities are needed,
>>>>> those newer clients would assume table-spec v1. That's wrong IMO.
>>>>
>>>>
>>>> That statement I mentioned only applies to the capabilities that are
>>>> currently in the PR and not to *table-spec / view-spec*.
>>>>
>>>>
>>>> I'm not a fan of a `rest-default-capabilities` property at all, because
>>>>> every user has to configure it explicitly and correctly
>>>>>
>>>>
>>>> As I mentioned, servers can configure this for *all* of their clients
>>>> via the *config* endpoint, so clients wouldn't have to do this
>>>> *manually*.
>>>>
>>>>
>>>> So I would argue to define the current set of APIs and specs as the
>>>>> default if the `capabilities` field is missing.
>>>>
>>>>
>>>> There have been two sides to this in prior discussions. Having *tables*
>>>> as the default vs having what's *currently in the spec* as the
>>>> default. The argument for having *tables* as the default is because we
>>>> can't assume that every REST server out there already supports views.
>>>>
>>>> Hence we're opting for the middle ground with *tables* + having a 
>>>> *configurable
>>>> fallback mechanism*. Servers that already support views can configure
>>>> their clients to default to *tables / views*, meaning that no
>>>> additional (manual) configuration from a client's perspective is required
>>>> to get table & view behavior.
>>>>
>>>> Eduard
>>>>
>>>> On Mon, Jul 15, 2024 at 3:00 PM Robert Stupp <sn...@snazy.de> wrote:
>>>>
>>>>> Sorry, I don't understand the two suggestions, especially when used in
>>>>> combination. Current servers do not send a `capabilities` field at all.
>>>>> You're suggesting to use a new `rest-default-capabilities` property to let
>>>>> newer clients assume `1`.  Once the table/view/etc-spec capabilities are
>>>>> needed, those newer clients would assume table-spec v1. That's wrong IMO.
>>>>>
>>>>> I'm not a fan of a `rest-default-capabilities` property at all,
>>>>> because every user has to configure it explicitly and correctly. I predict
>>>>> quite some users not doing this or not doing it correctly, causing some
>>>>> trouble that can be prevented. The way things are configured is already
>>>>> quite complex, and yet adding another option adds more complexity to
>>>>> Iceberg. So I would argue to define the current set of APIs and specs as
>>>>> the default if the `capabilities` field is missing.
>>>>>
>>>>> Just because the *current* implementation doesn't use
>>>>> table-spec/view-spec doesn't mean near future clients would need it -
>>>>> table-spec v3 isn't that far away. And with new data types, view-spec v2
>>>>> isn't far away either.
>>>>>
>>>>> Adding table-spec + view-spec capabilities now saves a lot of
>>>>> headaches for Iceberg users in the near future.
>>>>>
>>>>>
>>>>> On 15.07.24 11:27, Eduard Tudenhöfner wrote:
>>>>>
>>>>> I would suggest adding *table-spec / view-spec / udf-spec *capabilities
>>>>> later when new requirements/updates get added. The current implementation
>>>>> wouldn't make any use of these capabilities, so I don't see a good enough
>>>>> reason to add them at this point.
>>>>>
>>>>> The PR currently says: "tables -> default capability in case the
>>>>>> `capabilities` property doesn't exist or is empty in the response" -
>>>>>> meaning: the server would _only_ support tables. This phrase in the spec
>>>>>> proposal effectively removes the view functionality from all currently
>>>>>> existing Iceberg REST implementations.
>>>>>
>>>>>
>>>>> This is why the configurable fallback mechanism was mentioned in the
>>>>> Catalog sync, which can be realized with *r*
>>>>> *est-default-capabilities=tables,views,abc,xyz* (all of them
>>>>> defaulting to version 1). A server could send that property via the config
>>>>> route without having clients to change anything.
>>>>>
>>>>>
>>>>> On Mon, Jul 15, 2024 at 10:24 AM Robert Stupp <sn...@snazy.de> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I still have concerns regarding the missing table-spec/view-spec
>>>>>> capabilities. Newer clients can send create/update requests with
>>>>>> requirements/updates of newer Iceberg table/view/udf specs to a server 
>>>>>> that
>>>>>> doesn't support those spec versions - the outcome is rather undefined. 
>>>>>> What
>>>>>> should a server do? Ignore the unknown fields and requirement/update 
>>>>>> types
>>>>>> and hence do what it's potentially _not_ supposed to do? Reply with a 
>>>>>> then
>>>>>> ambiguous 501 (is it the endpoint that's not implemented or the request
>>>>>> content not supported)? Similar, what if a server decides to not support
>>>>>> for example table-spec v1 and just drop the manifest-file list in a table
>>>>>> snapshot leading to data loss?
>>>>>>
>>>>>> IMO capabilities must contain the table/view/... spec versions
>>>>>> supported by the server.
>>>>>>
>>>>>> There's also the concern about the behavior if the `capabilties`
>>>>>> field is missing (see
>>>>>> https://github.com/apache/iceberg/pull/9940/files#r1676113409, not
>>>>>> sure why the comment thread's resolved). The PR currently says: "tables 
>>>>>> ->
>>>>>> default capability in case the `capabilities` property doesn't exist or 
>>>>>> is
>>>>>> empty in the response" - meaning: the server would _only_ support tables.
>>>>>> This phrase in the spec proposal effectively removes the view 
>>>>>> functionality
>>>>>> from all currently existing Iceberg REST implementations.
>>>>>>
>>>>>>
>>>>>> On 11.07.24 08:42, Eduard Tudenhöfner wrote:
>>>>>>
>>>>>> Are there any other concerns with the proposal or should we start a
>>>>>> VOTE thread?
>>>>>>
>>>>>> Eduard
>>>>>>
>>>>>> On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
>>>>>> <dmitri.bourlatch...@dremio.com.invalid>
>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>
>>>>>>> Re: remote signing, I agree that it does not look like a server
>>>>>>>> capability that a client can / should discover. It is more like 
>>>>>>>> something
>>>>>>>> that the server instructs / configures the client to do.
>>>>>>>
>>>>>>>
>>>>>>> While a server can control this behavior and instruct the client to
>>>>>>> use remote signing, technically nothing is preventing a client from
>>>>>>> configuring s3.remote-signing-enabled=true. In such a case it seems
>>>>>>> more appropriate to indicate that this capability isn't supported rather
>>>>>>> than a generic 501, because not every server will support remote 
>>>>>>> signing.
>>>>>>>
>>>>>>>
>>>>>>> Good point regarding clients taking initiative and using request
>>>>>>> singing without an explicit server-provided config. It moves the client
>>>>>>> operations into a mode where the server has more control (over having
>>>>>>> longer term client-side credentials), so it looks like a reasonable 
>>>>>>> mode to
>>>>>>> support from the security perspective.
>>>>>>>
>>>>>>> Let's keep that capability flag.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Dmitri.
>>>>>>>
>>>>>>> On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner <
>>>>>>> etudenhoef...@apache.org> wrote:
>>>>>>>
>>>>>>>> Hey everyone,
>>>>>>>>
>>>>>>>> I've added a few inline comments below.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Re: remote signing, I agree that it does not look like a server
>>>>>>>>> capability that a client can / should discover. It is more like 
>>>>>>>>> something
>>>>>>>>> that the server instructs / configures the client to do.
>>>>>>>>
>>>>>>>>
>>>>>>>> While a server can control this behavior and instruct the client to
>>>>>>>> use remote signing, technically nothing is preventing a client from
>>>>>>>> configuring s3.remote-signing-enabled=true. In such a case it
>>>>>>>> seems more appropriate to indicate that this capability isn't supported
>>>>>>>> rather than a generic 501, because not every server will support remote
>>>>>>>> signing.
>>>>>>>>
>>>>>>>> The *vended-credentials* capability on the other hand is more
>>>>>>>> informative in its nature and a server indeed configures a client. I 
>>>>>>>> think
>>>>>>>> that was also one of the reasons I removed this capability but added it
>>>>>>>> later back due to a comment from Jack.
>>>>>>>>
>>>>>>>> I'm ok either way in terms of removing / keeping
>>>>>>>> *vended-credentials* as a capability but given that we'd want to
>>>>>>>> include *actionable* capabilities at this point, I'd just remove
>>>>>>>> it (nothing is preventing us from adding it later if necessary).
>>>>>>>>
>>>>>>>>
>>>>>>>> In that case, why do we need all these other capabilities like
>>>>>>>>> tables, remote-signing, etc. in the first place?
>>>>>>>>
>>>>>>>>
>>>>>>>> Given that capabilities also carry versioning information, clients
>>>>>>>> can make more informed decisions on which endpoints to call. One could
>>>>>>>> argue that generally throwing a 501 on everything that isn't supported
>>>>>>>> might be sufficient, but that doesn't necessarily help a client in 
>>>>>>>> knowing
>>>>>>>> which versions of a capability are safe to call/use.
>>>>>>>>
>>>>>>>> Regarding the control of client-side fallback behavior:
>>>>>>>> I think the default fallback behavior should be *tables* (with
>>>>>>>> version 1) with a property in the REST catalog that allows configuring 
>>>>>>>> this
>>>>>>>> to e.g. *rest-default-capabilities=tables,views,abc,xyz* (all of
>>>>>>>> them defaulting to version 1).
>>>>>>>>
>>>>>>>>
>>>>>>>> Eduard
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jul 9, 2024 at 7:00 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Yes I agree that sounds like a valid use case. So the criteria so
>>>>>>>>> far is that capabilities are used for:
>>>>>>>>> - controlling client-side fallback behavior
>>>>>>>>> - failing expensive operations early if we know it will eventually
>>>>>>>>> fail due to missing capability
>>>>>>>>>
>>>>>>>>> Do we agree if this is the criteria we should use? What about the
>>>>>>>>> other capabilities, namly tables, remote-signing, credential-vending?
>>>>>>>>>
>>>>>>>>> -Jack
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue
>>>>>>>>> <b...@databricks.com.invalid> <b...@databricks.com.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> > does it make a difference if I declare the capability or not?
>>>>>>>>>>
>>>>>>>>>> I think that it does in other cases. Multi-table commits, for
>>>>>>>>>> example, are a building block for multi-statement transactions. If a
>>>>>>>>>> service doesn't support multi-table commits then we ideally want 
>>>>>>>>>> clients to
>>>>>>>>>> know that ahead of time so that they don't run a big transaction and 
>>>>>>>>>> then
>>>>>>>>>> fail because the commit is not supported.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov
>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid>
>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> Re: remote signing, I agree that it does not look like a server
>>>>>>>>>>> capability that a client can / should discover. It is more like 
>>>>>>>>>>> something
>>>>>>>>>>> that the server instructs / configures the client to do.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Dmitri.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye <yezhao...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I was reconciling the discussion yesterday, one point that was
>>>>>>>>>>>> interesting to me was that we agreed the purpose of these 
>>>>>>>>>>>> capabilities is
>>>>>>>>>>>> to "control client-side fallback behavior", or at least the client 
>>>>>>>>>>>> should
>>>>>>>>>>>> behave differently based on these capabilities. However, this 
>>>>>>>>>>>> seems to be
>>>>>>>>>>>> only needed so far for views, or more specifically, for loadView 
>>>>>>>>>>>> API only
>>>>>>>>>>>> because it impacts the fallback behavior to resolve the identifier 
>>>>>>>>>>>> as a
>>>>>>>>>>>> table or not.
>>>>>>>>>>>>
>>>>>>>>>>>> For all the other capabilities listed, and even the other
>>>>>>>>>>>> endpoints in view, because a server can decide to implement it 
>>>>>>>>>>>> partially
>>>>>>>>>>>> anyway and just document the behavior, does it make a difference 
>>>>>>>>>>>> if I
>>>>>>>>>>>> declare the capability or not? The client will not stop the 
>>>>>>>>>>>> request, the
>>>>>>>>>>>> server will just error out if it is not supported. Maybe the error 
>>>>>>>>>>>> is not
>>>>>>>>>>>> in the expected code or message, but it is still an error. In that 
>>>>>>>>>>>> case,
>>>>>>>>>>>> why do we need all these other capabilities like tables, 
>>>>>>>>>>>> remote-signing,
>>>>>>>>>>>> etc. in the first place?
>>>>>>>>>>>>
>>>>>>>>>>>> Maybe it is too extreme of a thought, but could anyone help
>>>>>>>>>>>> describe how the other capabilities could be used beyond 
>>>>>>>>>>>> potentially
>>>>>>>>>>>> returning an error earlier?
>>>>>>>>>>>>
>>>>>>>>>>>> -Jack
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov
>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid>
>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Eduard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> > I've also added the 501 error to the response of the
>>>>>>>>>>>>> respective endpoints but worth mentioning that *HEAD* / *GET 
>>>>>>>>>>>>> *requests
>>>>>>>>>>>>> must not return a 501
>>>>>>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> 
>>>>>>>>>>>>> (this
>>>>>>>>>>>>> implies that the server impl would e.g. return a *404* in
>>>>>>>>>>>>> such a case).
>>>>>>>>>>>>>
>>>>>>>>>>>>> My reading on the Mozilla page makes me think that it is
>>>>>>>>>>>>> phrased too narrowly. Reading RFC 2616 [1] I believe that it does 
>>>>>>>>>>>>> not
>>>>>>>>>>>>> preclude responding with 501 to GET and HEAD requests. I think it 
>>>>>>>>>>>>> means
>>>>>>>>>>>>> that GET and HEAD methods must be supported by "general purpose" 
>>>>>>>>>>>>> servers.
>>>>>>>>>>>>> The Iceberg REST server is not a general purpose server for 
>>>>>>>>>>>>> resources. So,
>>>>>>>>>>>>> I think it should be fine to respond with 501 to unimplemented 
>>>>>>>>>>>>> endpoints.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Dmitri.
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner <
>>>>>>>>>>>>> etudenhoef...@apache.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I watched the catalog sync recording today and updated the PR
>>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/9940> to remove
>>>>>>>>>>>>>> fine-grained capabilities like *register-table /
>>>>>>>>>>>>>> table-metrics*.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The current capabilities (with versioning information) in the
>>>>>>>>>>>>>> PR are:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - tables
>>>>>>>>>>>>>>    - views
>>>>>>>>>>>>>>    - remote-signing
>>>>>>>>>>>>>>    - vended-credentials
>>>>>>>>>>>>>>    - multi-table-commit
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For servers that only *partially* implement endpoints under
>>>>>>>>>>>>>> a capability the spec requires the server to throw a *501
>>>>>>>>>>>>>> Not Implemented*. I've also added the 501 error to the
>>>>>>>>>>>>>> response of the respective endpoints but worth mentioning that
>>>>>>>>>>>>>> *HEAD* / *GET *requests must not return a 501
>>>>>>>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> 
>>>>>>>>>>>>>> (this
>>>>>>>>>>>>>> implies that the server impl would e.g. return a *404* in
>>>>>>>>>>>>>> such a case).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>> Eduard
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré <
>>>>>>>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Eduard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It makes sense to return 501 for servers which don't
>>>>>>>>>>>>>>> implement all
>>>>>>>>>>>>>>> endpoints. It means that the server will at least have to
>>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>>> empty endpoints if needed (that makes sense to me).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think we should focus on only "identified capabilities". I
>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>> that I proposed before that the capabilities can be
>>>>>>>>>>>>>>> overridden/provided by server implementation. Else, I'm
>>>>>>>>>>>>>>> afraid we
>>>>>>>>>>>>>>> won't be flexible enough or always behind the implementation
>>>>>>>>>>>>>>> (if an
>>>>>>>>>>>>>>> implementation wants to add "my-foo-cap").
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner
>>>>>>>>>>>>>>> <etudenhoef...@apache.org> wrote:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > I have clarified the wording in #9940 around the
>>>>>>>>>>>>>>> requirement on having to implement all endpoints under a 
>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>> capability.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > For servers that only partially implement endpoints under
>>>>>>>>>>>>>>> a capability the spec requires the server to throw a 501 Not 
>>>>>>>>>>>>>>> Implemented.
>>>>>>>>>>>>>>> This was suggested by Jack and it seems reasonable to do that.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Regarding the inclusion of table-spec / view-spec as a
>>>>>>>>>>>>>>> capability: I think this might make sense for the next 
>>>>>>>>>>>>>>> iteration of the
>>>>>>>>>>>>>>> REST spec but as I mentioned earlier I don't see any clear 
>>>>>>>>>>>>>>> benefit for the
>>>>>>>>>>>>>>> current REST spec as the client wouldn't do anything with that 
>>>>>>>>>>>>>>> information.
>>>>>>>>>>>>>>> > If there is a clear benefit of having this, then this can
>>>>>>>>>>>>>>> still be added later to the current REST spec but I believe we 
>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>> rather have a few well-defined and actionable capabilities 
>>>>>>>>>>>>>>> rather than too
>>>>>>>>>>>>>>> many.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Eduard
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu <
>>>>>>>>>>>>>>> liurenjie2...@gmail.com> wrote:
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert,
>>>>>>>>>>>>>>> how do you envision this to be used?
>>>>>>>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there
>>>>>>>>>>>>>>> are 2 approaches we can go with, taking CreateTable as an 
>>>>>>>>>>>>>>> example:
>>>>>>>>>>>>>>> >>> (1) increment the related operation version, which means
>>>>>>>>>>>>>>> that POST /v2/{prefix}/namespaces/{ns}/tables will be created 
>>>>>>>>>>>>>>> and allow
>>>>>>>>>>>>>>> creating tables in the v3 version.
>>>>>>>>>>>>>>> >>> (2) update the existing table metadata model to support
>>>>>>>>>>>>>>> both v2 and v3 fields, and the server enforces the payload 
>>>>>>>>>>>>>>> differently
>>>>>>>>>>>>>>> based on the TableMetadata.format-version field. If the server 
>>>>>>>>>>>>>>> does not
>>>>>>>>>>>>>>> support v3, it can return unsupported at that time.
>>>>>>>>>>>>>>> >>> Either way we go, the table-spec version does not need
>>>>>>>>>>>>>>> to be a capability. (1) seems to be cleaner, but has some 
>>>>>>>>>>>>>>> overhead in
>>>>>>>>>>>>>>> provisioning a new endpoint compared to (2).
>>>>>>>>>>>>>>> >>> Do you see another way to do this leveraging the
>>>>>>>>>>>>>>> table-spec version?
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> 2 is cleaner but maybe inconsistent with current
>>>>>>>>>>>>>>> behavior, since /v1/tables operation supports both v1 and v3. 
>>>>>>>>>>>>>>> We should
>>>>>>>>>>>>>>> only go to 2 only when we have incompatible fields/break 
>>>>>>>>>>>>>>> changes according
>>>>>>>>>>>>>>> to discussion.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> Generally I agree with adding table-spec into
>>>>>>>>>>>>>>> capabilities. For example, we can expose this to user in api so 
>>>>>>>>>>>>>>> that user
>>>>>>>>>>>>>>> could choose a supported table format version without throwing 
>>>>>>>>>>>>>>> exception.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye <
>>>>>>>>>>>>>>> yezhao...@gmail.com> wrote:
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert,
>>>>>>>>>>>>>>> how do you envision this to be used?
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there
>>>>>>>>>>>>>>> are 2 approaches we can go with, taking CreateTable as an 
>>>>>>>>>>>>>>> example:
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> (1) increment the related operation version, which means
>>>>>>>>>>>>>>> that POST /v2/{prefix}/namespaces/{ns}/tables will be created 
>>>>>>>>>>>>>>> and allow
>>>>>>>>>>>>>>> creating tables in the v3 version.
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> (2) update the existing table metadata model to support
>>>>>>>>>>>>>>> both v2 and v3 fields, and the server enforces the payload 
>>>>>>>>>>>>>>> differently
>>>>>>>>>>>>>>> based on the TableMetadata.format-version field. If the server 
>>>>>>>>>>>>>>> does not
>>>>>>>>>>>>>>> support v3, it can return unsupported at that time.
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> Either way we go, the table-spec version does not need
>>>>>>>>>>>>>>> to be a capability. (1) seems to be cleaner, but has some 
>>>>>>>>>>>>>>> overhead in
>>>>>>>>>>>>>>> provisioning a new endpoint compared to (2).
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> Do you see another way to do this leveraging the
>>>>>>>>>>>>>>> table-spec version?
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> -Jack
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner
>>>>>>>>>>>>>>> <eduard.tudenhoef...@databricks.com.invalid>
>>>>>>>>>>>>>>> <eduard.tudenhoef...@databricks.com.invalid> wrote:
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> I couldn't make it to the catalog sync meeting
>>>>>>>>>>>>>>> yesterday but I watched the recording today (thanks for 
>>>>>>>>>>>>>>> providing that).
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>>> The missing piece is how (new, capabilities-aware)
>>>>>>>>>>>>>>> clients handle the case when a service does _not_ return the 
>>>>>>>>>>>>>>> capabilities
>>>>>>>>>>>>>>> field (absent). My proposal would be that a client should in 
>>>>>>>>>>>>>>> this case
>>>>>>>>>>>>>>> assume that all _currently_ existing capabilities are supported.
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> - tables: [1]
>>>>>>>>>>>>>>> >>>>> - views: [1]
>>>>>>>>>>>>>>> >>>>> - remote-signing: [1]
>>>>>>>>>>>>>>> >>>>> - multi-table-commit: [1]
>>>>>>>>>>>>>>> >>>>> - register-table: [1]
>>>>>>>>>>>>>>> >>>>> - table-metrics: [1]
>>>>>>>>>>>>>>> >>>>> - table-spec: [1,2]
>>>>>>>>>>>>>>> >>>>> - view-spec: [1,2]
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>> The one thing I would like to add here is that the
>>>>>>>>>>>>>>> current PR uses the tables capability (as version 1) as the 
>>>>>>>>>>>>>>> default when a
>>>>>>>>>>>>>>> server doesn't return capabilities but it might be also ok to 
>>>>>>>>>>>>>>> include views
>>>>>>>>>>>>>>> (as version 1) because the current client impl has some code to 
>>>>>>>>>>>>>>> deal with
>>>>>>>>>>>>>>> errors in case endpoints don't exist.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> Unless we agree that the currently existing
>>>>>>>>>>>>>>> functionality in the REST spec is the default behavior to be 
>>>>>>>>>>>>>>> assumed for
>>>>>>>>>>>>>>> older server, I'm not sure about including remote-signing /
>>>>>>>>>>>>>>> multi-table-commit / register-table / table-metrics as it has 
>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>> indicated in earlier comments on the PR/ML that not every REST 
>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>> supports these.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> That being said, we should discuss whether we want the
>>>>>>>>>>>>>>> default behavior (when an older server doesn't send back 
>>>>>>>>>>>>>>> capabilities) to be
>>>>>>>>>>>>>>> >>>> a) tables (version 1) only
>>>>>>>>>>>>>>> >>>> b) the currently existing functionality as defined in
>>>>>>>>>>>>>>> the REST spec (as version 1)
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> On another note: Including table-spec / view-spec seems
>>>>>>>>>>>>>>> to be more informative in its nature as I don't think a client 
>>>>>>>>>>>>>>> would act
>>>>>>>>>>>>>>> differently right now when seeing these.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> Thanks
>>>>>>>>>>>>>>> >>>> Eduard
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Ryan Blue
>>>>>>>>>> Databricks
>>>>>>>>>>
>>>>>>>>> --
>>>>>> Robert Stupp
>>>>>> @snazy
>>>>>>
>>>>>> --
>>>>> Robert Stupp
>>>>> @snazy
>>>>>
>>>>>

Re: [DISCUSS] Describing REST Server capabilities

Reply via email to