One thing to clarify, regarding per-endpoint versioning, my understanding is that endpoint version should bump (e.g. GET /v1/namespaces to GET /v2/namespaces) when there is a significant backwards incompatible change.
-Jack On Tue, Jul 30, 2024 at 7:56 PM Jack Ye <yezhao...@gmail.com> wrote: > > are you talking about the endpoint path like "/v1/" > > No, I mean in general the catalog spec version, which is marked currently > as 0.0.1: > https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L27 > > And in my mind it would be you periodically release a version of the > catalog spec, just like you release Iceberg java libraries and pyiceberg. > And catalog providers will choose to upgrade to that version, and decide if > anything has changed for their existing support, add new support for new > features, document new restrictions and limitations. Basically just go > through a typical product lifecycle when integrating with the spec. > > -Jack > > > On Tue, Jul 30, 2024 at 12:24 PM Steven Wu <stevenz...@gmail.com> wrote: > >> > (2) version the entire catalog spec. A released catalog spec version >> will contain a list of configs it supports, and also a set of APIs and all >> features embedded in the APIs. A server will report the specific catalog >> version it adheres to, and then document the nuances. >> >> Jack, just to clarify, are you talking about the endpoint path like >> "/v1/"? Also, does that mean every API/feature addition would require a >> catalog version bump? >> >> On Tue, Jul 30, 2024 at 8:34 AM Jack Ye <yezhao...@gmail.com> wrote: >> >>> Since the catalog sync was canceled this week, I find maybe it is better >>> to reply here for my latest take on this topic. >>> >>> I think we have 2 discussions intertwined here, that I would like to >>> decouple if possible. >>> >>> (1) is it worth having a concept of capabilities to control client >>> behaviors? >>> (2) suppose we introduce capabilities, is it worth having versioned >>> capabilities? >>> >>> Personally speaking I am currently still more inclined to not have >>> capabilities. An alternative here is to keep doing what has been done for >>> metrics API, which is to introduce feature flags like >>> rest-metrics-reporting-enabled. One strong argument I saw for this >>> alternative is that a feature flag can express non-binary options. For >>> capabilities, you are bound to say just whether the server has this >>> capability or not. But what we really want is to control client behavior >>> based on the capability. And for that, there could be multiple options for >>> the client to interact with the server in existence/absence of a feature. >>> For example, for multi-table commit, there could be 2 different behaviors >>> when the server does not support the endpoint, (1) fail the operation >>> early, (2) fallback to use single-table commit for each table. >>> >>> And with this alternative, there is of course no versioned capabilities. >>> But I think the reason we want versioned capabilities is because we want a >>> general versioning story for the catalog spec with forward and backward >>> compatibility guarantees. If that is the goal, why not: (1) acknowledge the >>> feature flag configs as a part of the spec, (2) version the entire catalog >>> spec. A released catalog spec version will contain a list of configs it >>> supports, and also a set of APIs and all features embedded in the APIs. A >>> server will report the specific catalog version it adheres to, and then >>> document the nuances. I feel this would put catalog providers in a more >>> comfortable situation, as they now have a stable catalog spec to adhere to >>> as the basis, that does not just automatically evolve within the same >>> version. They can implement a catalog spec and upgrade at their own pace >>> following a common versioning semantics. They will also report whatever >>> level of support and detailed behaviors they want, without the need to tie >>> specific behaviors to different capabilities. >>> >>> I think we have been spending quite a long time on this topic, but this >>> is so fundamental that I feel we should think through the alternatives. >>> Would it be possible to at least document in the design proposal why the >>> alternatives are not desirable, what are the pros and cons? >>> >>> -Jack >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Jul 16, 2024 at 5:47 AM Eduard Tudenhöfner < >>> etudenhoef...@apache.org> wrote: >>> >>>> Hey everyone, >>>> >>>> I've written up >>>> https://docs.google.com/document/d/1F1xh6SJhS-opgWRe1pPvWh01j8VHNHRocfCCFttKNf0/edit >>>> to >>>> provide an easier way of giving feedback to the proposal. >>>> Please take a look so that we can discuss how we'd like to handle the >>>> default fallback behavior (*tables* vs *everything that's currently in >>>> the spec*) when a newer client talks to an older server. >>>> >>>> >>>> Eduard >>>> >>>> On Mon, Jul 15, 2024 at 4:24 PM Dmitri Bourlatchkov >>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>> >>>>> So I would argue to define the current set of APIs and specs as the >>>>>> default if the `capabilities` field is missing. >>>>> >>>>> >>>>> There have been two sides to this in prior discussions. Having >>>>> *tables* as the default vs having what's *currently in the spec* as >>>>> the default. The argument for having *tables* as the default is >>>>> because we can't assume that every REST server out there already supports >>>>> views. >>>>> >>>>> >>>>> Can we assume that a server that does not declare capabilities does >>>>> NOT implement views? IMHO, that assumption is too strong and will break >>>>> use >>>>> cases when the client is upgraded, but the server is not. >>>>> >>>>> Before capabilities were introduced, clients used to work in a certain >>>>> way. I think when the client starts interpreting capabilities, but the >>>>> server does not declare the capabilities property at all, the client >>>>> should >>>>> (by default) work the same way as when it did not expect capabilities to >>>>> be >>>>> declared. >>>>> >>>>> >>>>> Hence we're opting for the middle ground with *tables* + having a >>>>> *configurable >>>>> fallback mechanism*. Servers that already support views can configure >>>>> their clients to default to *tables / views*, meaning that no >>>>> additional (manual) configuration from a client's perspective is required >>>>> to get table & view behavior. >>>>> >>>>> >>>>> Forcing a server upgrade when users just want to upgrade the client is >>>>> too much of a burden, I think. Servers and clients are often managed by >>>>> different groups of people. >>>>> >>>>> In the end, IIRC previous posts in this thread correctly, declaring >>>>> server capabilities is an optimization to allow more efficient / less >>>>> error-prone client operation. I do not think it should impose additional >>>>> functional / interoperability requirements on servers. >>>>> >>>>> Cheers, >>>>> Dmitri. >>>>> >>>>> On Mon, Jul 15, 2024 at 10:11 AM Eduard Tudenhöfner < >>>>> etudenhoef...@apache.org> wrote: >>>>> >>>>>> Current servers do not send a `capabilities` field at all. You're >>>>>>> suggesting to use a new `rest-default-capabilities` property to let >>>>>>> newer >>>>>>> clients assume `1`. Once the table/view/etc-spec capabilities are >>>>>>> needed, >>>>>>> those newer clients would assume table-spec v1. That's wrong IMO. >>>>>> >>>>>> >>>>>> That statement I mentioned only applies to the capabilities that are >>>>>> currently in the PR and not to *table-spec / view-spec*. >>>>>> >>>>>> >>>>>> I'm not a fan of a `rest-default-capabilities` property at all, >>>>>>> because every user has to configure it explicitly and correctly >>>>>>> >>>>>> >>>>>> As I mentioned, servers can configure this for *all* of their >>>>>> clients via the *config* endpoint, so clients wouldn't have to do >>>>>> this *manually*. >>>>>> >>>>>> >>>>>> So I would argue to define the current set of APIs and specs as the >>>>>>> default if the `capabilities` field is missing. >>>>>> >>>>>> >>>>>> There have been two sides to this in prior discussions. Having >>>>>> *tables* as the default vs having what's *currently in the spec* as >>>>>> the default. The argument for having *tables* as the default is >>>>>> because we can't assume that every REST server out there already supports >>>>>> views. >>>>>> >>>>>> Hence we're opting for the middle ground with *tables* + having a >>>>>> *configurable >>>>>> fallback mechanism*. Servers that already support views can >>>>>> configure their clients to default to *tables / views*, meaning that >>>>>> no additional (manual) configuration from a client's perspective is >>>>>> required to get table & view behavior. >>>>>> >>>>>> Eduard >>>>>> >>>>>> On Mon, Jul 15, 2024 at 3:00 PM Robert Stupp <sn...@snazy.de> wrote: >>>>>> >>>>>>> Sorry, I don't understand the two suggestions, especially when used >>>>>>> in combination. Current servers do not send a `capabilities` field at >>>>>>> all. >>>>>>> You're suggesting to use a new `rest-default-capabilities` property to >>>>>>> let >>>>>>> newer clients assume `1`. Once the table/view/etc-spec capabilities are >>>>>>> needed, those newer clients would assume table-spec v1. That's wrong >>>>>>> IMO. >>>>>>> >>>>>>> I'm not a fan of a `rest-default-capabilities` property at all, >>>>>>> because every user has to configure it explicitly and correctly. I >>>>>>> predict >>>>>>> quite some users not doing this or not doing it correctly, causing some >>>>>>> trouble that can be prevented. The way things are configured is already >>>>>>> quite complex, and yet adding another option adds more complexity to >>>>>>> Iceberg. So I would argue to define the current set of APIs and specs as >>>>>>> the default if the `capabilities` field is missing. >>>>>>> >>>>>>> Just because the *current* implementation doesn't use >>>>>>> table-spec/view-spec doesn't mean near future clients would need it - >>>>>>> table-spec v3 isn't that far away. And with new data types, view-spec v2 >>>>>>> isn't far away either. >>>>>>> >>>>>>> Adding table-spec + view-spec capabilities now saves a lot of >>>>>>> headaches for Iceberg users in the near future. >>>>>>> >>>>>>> >>>>>>> On 15.07.24 11:27, Eduard Tudenhöfner wrote: >>>>>>> >>>>>>> I would suggest adding *table-spec / view-spec / udf-spec *capabilities >>>>>>> later when new requirements/updates get added. The current >>>>>>> implementation >>>>>>> wouldn't make any use of these capabilities, so I don't see a good >>>>>>> enough >>>>>>> reason to add them at this point. >>>>>>> >>>>>>> The PR currently says: "tables -> default capability in case the >>>>>>>> `capabilities` property doesn't exist or is empty in the response" - >>>>>>>> meaning: the server would _only_ support tables. This phrase in the >>>>>>>> spec >>>>>>>> proposal effectively removes the view functionality from all currently >>>>>>>> existing Iceberg REST implementations. >>>>>>> >>>>>>> >>>>>>> This is why the configurable fallback mechanism was mentioned in the >>>>>>> Catalog sync, which can be realized with *r* >>>>>>> *est-default-capabilities=tables,views,abc,xyz* (all of them >>>>>>> defaulting to version 1). A server could send that property via the >>>>>>> config >>>>>>> route without having clients to change anything. >>>>>>> >>>>>>> >>>>>>> On Mon, Jul 15, 2024 at 10:24 AM Robert Stupp <sn...@snazy.de> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I still have concerns regarding the missing table-spec/view-spec >>>>>>>> capabilities. Newer clients can send create/update requests with >>>>>>>> requirements/updates of newer Iceberg table/view/udf specs to a server >>>>>>>> that >>>>>>>> doesn't support those spec versions - the outcome is rather undefined. >>>>>>>> What >>>>>>>> should a server do? Ignore the unknown fields and requirement/update >>>>>>>> types >>>>>>>> and hence do what it's potentially _not_ supposed to do? Reply with a >>>>>>>> then >>>>>>>> ambiguous 501 (is it the endpoint that's not implemented or the request >>>>>>>> content not supported)? Similar, what if a server decides to not >>>>>>>> support >>>>>>>> for example table-spec v1 and just drop the manifest-file list in a >>>>>>>> table >>>>>>>> snapshot leading to data loss? >>>>>>>> >>>>>>>> IMO capabilities must contain the table/view/... spec versions >>>>>>>> supported by the server. >>>>>>>> >>>>>>>> There's also the concern about the behavior if the `capabilties` >>>>>>>> field is missing (see >>>>>>>> https://github.com/apache/iceberg/pull/9940/files#r1676113409, not >>>>>>>> sure why the comment thread's resolved). The PR currently says: >>>>>>>> "tables -> >>>>>>>> default capability in case the `capabilities` property doesn't exist >>>>>>>> or is >>>>>>>> empty in the response" - meaning: the server would _only_ support >>>>>>>> tables. >>>>>>>> This phrase in the spec proposal effectively removes the view >>>>>>>> functionality >>>>>>>> from all currently existing Iceberg REST implementations. >>>>>>>> >>>>>>>> >>>>>>>> On 11.07.24 08:42, Eduard Tudenhöfner wrote: >>>>>>>> >>>>>>>> Are there any other concerns with the proposal or should we start a >>>>>>>> VOTE thread? >>>>>>>> >>>>>>>> Eduard >>>>>>>> >>>>>>>> On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov >>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> >>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>> >>>>>>>>> Re: remote signing, I agree that it does not look like a server >>>>>>>>>> capability that a client can / should discover. It is more like >>>>>>>>>> something >>>>>>>>>> that the server instructs / configures the client to do. >>>>>>>>> >>>>>>>>> >>>>>>>>> While a server can control this behavior and instruct the client >>>>>>>>> to use remote signing, technically nothing is preventing a client from >>>>>>>>> configuring s3.remote-signing-enabled=true. In such a case it >>>>>>>>> seems more appropriate to indicate that this capability isn't >>>>>>>>> supported >>>>>>>>> rather than a generic 501, because not every server will support >>>>>>>>> remote >>>>>>>>> signing. >>>>>>>>> >>>>>>>>> >>>>>>>>> Good point regarding clients taking initiative and using request >>>>>>>>> singing without an explicit server-provided config. It moves the >>>>>>>>> client >>>>>>>>> operations into a mode where the server has more control (over having >>>>>>>>> longer term client-side credentials), so it looks like a reasonable >>>>>>>>> mode to >>>>>>>>> support from the security perspective. >>>>>>>>> >>>>>>>>> Let's keep that capability flag. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Dmitri. >>>>>>>>> >>>>>>>>> On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner < >>>>>>>>> etudenhoef...@apache.org> wrote: >>>>>>>>> >>>>>>>>>> Hey everyone, >>>>>>>>>> >>>>>>>>>> I've added a few inline comments below. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Re: remote signing, I agree that it does not look like a server >>>>>>>>>>> capability that a client can / should discover. It is more like >>>>>>>>>>> something >>>>>>>>>>> that the server instructs / configures the client to do. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> While a server can control this behavior and instruct the client >>>>>>>>>> to use remote signing, technically nothing is preventing a client >>>>>>>>>> from >>>>>>>>>> configuring s3.remote-signing-enabled=true. In such a case it >>>>>>>>>> seems more appropriate to indicate that this capability isn't >>>>>>>>>> supported >>>>>>>>>> rather than a generic 501, because not every server will support >>>>>>>>>> remote >>>>>>>>>> signing. >>>>>>>>>> >>>>>>>>>> The *vended-credentials* capability on the other hand is more >>>>>>>>>> informative in its nature and a server indeed configures a client. I >>>>>>>>>> think >>>>>>>>>> that was also one of the reasons I removed this capability but added >>>>>>>>>> it >>>>>>>>>> later back due to a comment from Jack. >>>>>>>>>> >>>>>>>>>> I'm ok either way in terms of removing / keeping >>>>>>>>>> *vended-credentials* as a capability but given that we'd want to >>>>>>>>>> include *actionable* capabilities at this point, I'd just remove >>>>>>>>>> it (nothing is preventing us from adding it later if necessary). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> In that case, why do we need all these other capabilities like >>>>>>>>>>> tables, remote-signing, etc. in the first place? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Given that capabilities also carry versioning information, >>>>>>>>>> clients can make more informed decisions on which endpoints to call. >>>>>>>>>> One >>>>>>>>>> could argue that generally throwing a 501 on everything that isn't >>>>>>>>>> supported might be sufficient, but that doesn't necessarily help a >>>>>>>>>> client >>>>>>>>>> in knowing which versions of a capability are safe to call/use. >>>>>>>>>> >>>>>>>>>> Regarding the control of client-side fallback behavior: >>>>>>>>>> I think the default fallback behavior should be *tables* (with >>>>>>>>>> version 1) with a property in the REST catalog that allows >>>>>>>>>> configuring this >>>>>>>>>> to e.g. *rest-default-capabilities=tables,views,abc,xyz* (all of >>>>>>>>>> them defaulting to version 1). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Eduard >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Jul 9, 2024 at 7:00 PM Jack Ye <yezhao...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Yes I agree that sounds like a valid use case. So the criteria >>>>>>>>>>> so far is that capabilities are used for: >>>>>>>>>>> - controlling client-side fallback behavior >>>>>>>>>>> - failing expensive operations early if we know it will >>>>>>>>>>> eventually fail due to missing capability >>>>>>>>>>> >>>>>>>>>>> Do we agree if this is the criteria we should use? What about >>>>>>>>>>> the other capabilities, namly tables, remote-signing, >>>>>>>>>>> credential-vending? >>>>>>>>>>> >>>>>>>>>>> -Jack >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue >>>>>>>>>>> <b...@databricks.com.invalid> <b...@databricks.com.invalid> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> > does it make a difference if I declare the capability or not? >>>>>>>>>>>> >>>>>>>>>>>> I think that it does in other cases. Multi-table commits, for >>>>>>>>>>>> example, are a building block for multi-statement transactions. If >>>>>>>>>>>> a >>>>>>>>>>>> service doesn't support multi-table commits then we ideally want >>>>>>>>>>>> clients to >>>>>>>>>>>> know that ahead of time so that they don't run a big transaction >>>>>>>>>>>> and then >>>>>>>>>>>> fail because the commit is not supported. >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov >>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> >>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Re: remote signing, I agree that it does not look like a >>>>>>>>>>>>> server capability that a client can / should discover. It is more >>>>>>>>>>>>> like >>>>>>>>>>>>> something that the server instructs / configures the client to do. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Dmitri. >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye <yezhao...@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I was reconciling the discussion yesterday, one point that >>>>>>>>>>>>>> was interesting to me was that we agreed the purpose of these >>>>>>>>>>>>>> capabilities >>>>>>>>>>>>>> is to "control client-side fallback behavior", or at least the >>>>>>>>>>>>>> client >>>>>>>>>>>>>> should behave differently based on these capabilities. However, >>>>>>>>>>>>>> this seems >>>>>>>>>>>>>> to be only needed so far for views, or more specifically, for >>>>>>>>>>>>>> loadView API >>>>>>>>>>>>>> only because it impacts the fallback behavior to resolve the >>>>>>>>>>>>>> identifier as >>>>>>>>>>>>>> a table or not. >>>>>>>>>>>>>> >>>>>>>>>>>>>> For all the other capabilities listed, and even the other >>>>>>>>>>>>>> endpoints in view, because a server can decide to implement it >>>>>>>>>>>>>> partially >>>>>>>>>>>>>> anyway and just document the behavior, does it make a difference >>>>>>>>>>>>>> if I >>>>>>>>>>>>>> declare the capability or not? The client will not stop the >>>>>>>>>>>>>> request, the >>>>>>>>>>>>>> server will just error out if it is not supported. Maybe the >>>>>>>>>>>>>> error is not >>>>>>>>>>>>>> in the expected code or message, but it is still an error. In >>>>>>>>>>>>>> that case, >>>>>>>>>>>>>> why do we need all these other capabilities like tables, >>>>>>>>>>>>>> remote-signing, >>>>>>>>>>>>>> etc. in the first place? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Maybe it is too extreme of a thought, but could anyone help >>>>>>>>>>>>>> describe how the other capabilities could be used beyond >>>>>>>>>>>>>> potentially >>>>>>>>>>>>>> returning an error earlier? >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Jack >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov >>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> >>>>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Eduard, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> > I've also added the 501 error to the response of the >>>>>>>>>>>>>>> respective endpoints but worth mentioning that *HEAD* / *GET >>>>>>>>>>>>>>> *requests must not return a 501 >>>>>>>>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> >>>>>>>>>>>>>>> (this >>>>>>>>>>>>>>> implies that the server impl would e.g. return a *404* in >>>>>>>>>>>>>>> such a case). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> My reading on the Mozilla page makes me think that it is >>>>>>>>>>>>>>> phrased too narrowly. Reading RFC 2616 [1] I believe that it >>>>>>>>>>>>>>> does not >>>>>>>>>>>>>>> preclude responding with 501 to GET and HEAD requests. I think >>>>>>>>>>>>>>> it means >>>>>>>>>>>>>>> that GET and HEAD methods must be supported by "general >>>>>>>>>>>>>>> purpose" servers. >>>>>>>>>>>>>>> The Iceberg REST server is not a general purpose server for >>>>>>>>>>>>>>> resources. So, >>>>>>>>>>>>>>> I think it should be fine to respond with 501 to unimplemented >>>>>>>>>>>>>>> endpoints. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> Dmitri. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner < >>>>>>>>>>>>>>> etudenhoef...@apache.org> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hey everyone, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I watched the catalog sync recording today and updated the >>>>>>>>>>>>>>>> PR <https://github.com/apache/iceberg/pull/9940> to remove >>>>>>>>>>>>>>>> fine-grained capabilities like *register-table / >>>>>>>>>>>>>>>> table-metrics*. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The current capabilities (with versioning information) in >>>>>>>>>>>>>>>> the PR are: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - tables >>>>>>>>>>>>>>>> - views >>>>>>>>>>>>>>>> - remote-signing >>>>>>>>>>>>>>>> - vended-credentials >>>>>>>>>>>>>>>> - multi-table-commit >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For servers that only *partially* implement endpoints >>>>>>>>>>>>>>>> under a capability the spec requires the server to throw a *501 >>>>>>>>>>>>>>>> Not Implemented*. I've also added the 501 error to the >>>>>>>>>>>>>>>> response of the respective endpoints but worth mentioning that >>>>>>>>>>>>>>>> *HEAD* / *GET *requests must not return a 501 >>>>>>>>>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> >>>>>>>>>>>>>>>> (this >>>>>>>>>>>>>>>> implies that the server impl would e.g. return a *404* in >>>>>>>>>>>>>>>> such a case). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>> Eduard >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré < >>>>>>>>>>>>>>>> j...@nanthrax.net> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Eduard, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It makes sense to return 501 for servers which don't >>>>>>>>>>>>>>>>> implement all >>>>>>>>>>>>>>>>> endpoints. It means that the server will at least have to >>>>>>>>>>>>>>>>> implement >>>>>>>>>>>>>>>>> empty endpoints if needed (that makes sense to me). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I think we should focus on only "identified capabilities". >>>>>>>>>>>>>>>>> I think >>>>>>>>>>>>>>>>> that I proposed before that the capabilities can be >>>>>>>>>>>>>>>>> overridden/provided by server implementation. Else, I'm >>>>>>>>>>>>>>>>> afraid we >>>>>>>>>>>>>>>>> won't be flexible enough or always behind the >>>>>>>>>>>>>>>>> implementation (if an >>>>>>>>>>>>>>>>> implementation wants to add "my-foo-cap"). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>>> JB >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner >>>>>>>>>>>>>>>>> <etudenhoef...@apache.org> wrote: >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > I have clarified the wording in #9940 around the >>>>>>>>>>>>>>>>> requirement on having to implement all endpoints under a >>>>>>>>>>>>>>>>> particular >>>>>>>>>>>>>>>>> capability. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > For servers that only partially implement endpoints >>>>>>>>>>>>>>>>> under a capability the spec requires the server to throw a >>>>>>>>>>>>>>>>> 501 Not >>>>>>>>>>>>>>>>> Implemented. This was suggested by Jack and it seems >>>>>>>>>>>>>>>>> reasonable to do that. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Regarding the inclusion of table-spec / view-spec as a >>>>>>>>>>>>>>>>> capability: I think this might make sense for the next >>>>>>>>>>>>>>>>> iteration of the >>>>>>>>>>>>>>>>> REST spec but as I mentioned earlier I don't see any clear >>>>>>>>>>>>>>>>> benefit for the >>>>>>>>>>>>>>>>> current REST spec as the client wouldn't do anything with >>>>>>>>>>>>>>>>> that information. >>>>>>>>>>>>>>>>> > If there is a clear benefit of having this, then this >>>>>>>>>>>>>>>>> can still be added later to the current REST spec but I >>>>>>>>>>>>>>>>> believe we should >>>>>>>>>>>>>>>>> rather have a few well-defined and actionable capabilities >>>>>>>>>>>>>>>>> rather than too >>>>>>>>>>>>>>>>> many. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Eduard >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu < >>>>>>>>>>>>>>>>> liurenjie2...@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> Spec is an interesting topic we did not discuss. >>>>>>>>>>>>>>>>> Robert, how do you envision this to be used? >>>>>>>>>>>>>>>>> >>> In my mind, if a new table format v3 is launched, >>>>>>>>>>>>>>>>> there are 2 approaches we can go with, taking CreateTable as >>>>>>>>>>>>>>>>> an example: >>>>>>>>>>>>>>>>> >>> (1) increment the related operation version, which >>>>>>>>>>>>>>>>> means that POST /v2/{prefix}/namespaces/{ns}/tables will be >>>>>>>>>>>>>>>>> created and >>>>>>>>>>>>>>>>> allow creating tables in the v3 version. >>>>>>>>>>>>>>>>> >>> (2) update the existing table metadata model to >>>>>>>>>>>>>>>>> support both v2 and v3 fields, and the server enforces the >>>>>>>>>>>>>>>>> payload >>>>>>>>>>>>>>>>> differently based on the TableMetadata.format-version field. >>>>>>>>>>>>>>>>> If the server >>>>>>>>>>>>>>>>> does not support v3, it can return unsupported at that time. >>>>>>>>>>>>>>>>> >>> Either way we go, the table-spec version does not need >>>>>>>>>>>>>>>>> to be a capability. (1) seems to be cleaner, but has some >>>>>>>>>>>>>>>>> overhead in >>>>>>>>>>>>>>>>> provisioning a new endpoint compared to (2). >>>>>>>>>>>>>>>>> >>> Do you see another way to do this leveraging the >>>>>>>>>>>>>>>>> table-spec version? >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> 2 is cleaner but maybe inconsistent with current >>>>>>>>>>>>>>>>> behavior, since /v1/tables operation supports both v1 and v3. >>>>>>>>>>>>>>>>> We should >>>>>>>>>>>>>>>>> only go to 2 only when we have incompatible fields/break >>>>>>>>>>>>>>>>> changes according >>>>>>>>>>>>>>>>> to discussion. >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> Generally I agree with adding table-spec into >>>>>>>>>>>>>>>>> capabilities. For example, we can expose this to user in api >>>>>>>>>>>>>>>>> so that user >>>>>>>>>>>>>>>>> could choose a supported table format version without >>>>>>>>>>>>>>>>> throwing exception. >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye < >>>>>>>>>>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> Spec is an interesting topic we did not discuss. >>>>>>>>>>>>>>>>> Robert, how do you envision this to be used? >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> In my mind, if a new table format v3 is launched, >>>>>>>>>>>>>>>>> there are 2 approaches we can go with, taking CreateTable as >>>>>>>>>>>>>>>>> an example: >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> (1) increment the related operation version, which >>>>>>>>>>>>>>>>> means that POST /v2/{prefix}/namespaces/{ns}/tables will be >>>>>>>>>>>>>>>>> created and >>>>>>>>>>>>>>>>> allow creating tables in the v3 version. >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> (2) update the existing table metadata model to >>>>>>>>>>>>>>>>> support both v2 and v3 fields, and the server enforces the >>>>>>>>>>>>>>>>> payload >>>>>>>>>>>>>>>>> differently based on the TableMetadata.format-version field. >>>>>>>>>>>>>>>>> If the server >>>>>>>>>>>>>>>>> does not support v3, it can return unsupported at that time. >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> Either way we go, the table-spec version does not need >>>>>>>>>>>>>>>>> to be a capability. (1) seems to be cleaner, but has some >>>>>>>>>>>>>>>>> overhead in >>>>>>>>>>>>>>>>> provisioning a new endpoint compared to (2). >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> Do you see another way to do this leveraging the >>>>>>>>>>>>>>>>> table-spec version? >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> -Jack >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>>> >>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner >>>>>>>>>>>>>>>>> <eduard.tudenhoef...@databricks.com.invalid> >>>>>>>>>>>>>>>>> <eduard.tudenhoef...@databricks.com.invalid> wrote: >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> I couldn't make it to the catalog sync meeting >>>>>>>>>>>>>>>>> yesterday but I watched the recording today (thanks for >>>>>>>>>>>>>>>>> providing that). >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>>> The missing piece is how (new, capabilities-aware) >>>>>>>>>>>>>>>>> clients handle the case when a service does _not_ return the >>>>>>>>>>>>>>>>> capabilities >>>>>>>>>>>>>>>>> field (absent). My proposal would be that a client should in >>>>>>>>>>>>>>>>> this case >>>>>>>>>>>>>>>>> assume that all _currently_ existing capabilities are >>>>>>>>>>>>>>>>> supported. >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> - tables: [1] >>>>>>>>>>>>>>>>> >>>>> - views: [1] >>>>>>>>>>>>>>>>> >>>>> - remote-signing: [1] >>>>>>>>>>>>>>>>> >>>>> - multi-table-commit: [1] >>>>>>>>>>>>>>>>> >>>>> - register-table: [1] >>>>>>>>>>>>>>>>> >>>>> - table-metrics: [1] >>>>>>>>>>>>>>>>> >>>>> - table-spec: [1,2] >>>>>>>>>>>>>>>>> >>>>> - view-spec: [1,2] >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>>>> >>>> The one thing I would like to add here is that the >>>>>>>>>>>>>>>>> current PR uses the tables capability (as version 1) as the >>>>>>>>>>>>>>>>> default when a >>>>>>>>>>>>>>>>> server doesn't return capabilities but it might be also ok to >>>>>>>>>>>>>>>>> include views >>>>>>>>>>>>>>>>> (as version 1) because the current client impl has some code >>>>>>>>>>>>>>>>> to deal with >>>>>>>>>>>>>>>>> errors in case endpoints don't exist. >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> Unless we agree that the currently existing >>>>>>>>>>>>>>>>> functionality in the REST spec is the default behavior to be >>>>>>>>>>>>>>>>> assumed for >>>>>>>>>>>>>>>>> older server, I'm not sure about including remote-signing / >>>>>>>>>>>>>>>>> multi-table-commit / register-table / table-metrics as it has >>>>>>>>>>>>>>>>> been >>>>>>>>>>>>>>>>> indicated in earlier comments on the PR/ML that not every >>>>>>>>>>>>>>>>> REST server >>>>>>>>>>>>>>>>> supports these. >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> That being said, we should discuss whether we want >>>>>>>>>>>>>>>>> the default behavior (when an older server doesn't send back >>>>>>>>>>>>>>>>> capabilities) >>>>>>>>>>>>>>>>> to be >>>>>>>>>>>>>>>>> >>>> a) tables (version 1) only >>>>>>>>>>>>>>>>> >>>> b) the currently existing functionality as defined in >>>>>>>>>>>>>>>>> the REST spec (as version 1) >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> On another note: Including table-spec / view-spec >>>>>>>>>>>>>>>>> seems to be more informative in its nature as I don't think a >>>>>>>>>>>>>>>>> client would >>>>>>>>>>>>>>>>> act differently right now when seeing these. >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>> Thanks >>>>>>>>>>>>>>>>> >>>> Eduard >>>>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Ryan Blue >>>>>>>>>>>> Databricks >>>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>> Robert Stupp >>>>>>>> @snazy >>>>>>>> >>>>>>>> -- >>>>>>> Robert Stupp >>>>>>> @snazy >>>>>>> >>>>>>>