> (2) version the entire catalog spec. A released catalog spec version will contain a list of configs it supports, and also a set of APIs and all features embedded in the APIs. A server will report the specific catalog version it adheres to, and then document the nuances.
Jack, just to clarify, are you talking about the endpoint path like "/v1/"? Also, does that mean every API/feature addition would require a catalog version bump? On Tue, Jul 30, 2024 at 8:34 AM Jack Ye <yezhao...@gmail.com> wrote: > Since the catalog sync was canceled this week, I find maybe it is better > to reply here for my latest take on this topic. > > I think we have 2 discussions intertwined here, that I would like to > decouple if possible. > > (1) is it worth having a concept of capabilities to control client > behaviors? > (2) suppose we introduce capabilities, is it worth having versioned > capabilities? > > Personally speaking I am currently still more inclined to not have > capabilities. An alternative here is to keep doing what has been done for > metrics API, which is to introduce feature flags like > rest-metrics-reporting-enabled. One strong argument I saw for this > alternative is that a feature flag can express non-binary options. For > capabilities, you are bound to say just whether the server has this > capability or not. But what we really want is to control client behavior > based on the capability. And for that, there could be multiple options for > the client to interact with the server in existence/absence of a feature. > For example, for multi-table commit, there could be 2 different behaviors > when the server does not support the endpoint, (1) fail the operation > early, (2) fallback to use single-table commit for each table. > > And with this alternative, there is of course no versioned capabilities. > But I think the reason we want versioned capabilities is because we want a > general versioning story for the catalog spec with forward and backward > compatibility guarantees. If that is the goal, why not: (1) acknowledge the > feature flag configs as a part of the spec, (2) version the entire catalog > spec. A released catalog spec version will contain a list of configs it > supports, and also a set of APIs and all features embedded in the APIs. A > server will report the specific catalog version it adheres to, and then > document the nuances. I feel this would put catalog providers in a more > comfortable situation, as they now have a stable catalog spec to adhere to > as the basis, that does not just automatically evolve within the same > version. They can implement a catalog spec and upgrade at their own pace > following a common versioning semantics. They will also report whatever > level of support and detailed behaviors they want, without the need to tie > specific behaviors to different capabilities. > > I think we have been spending quite a long time on this topic, but this is > so fundamental that I feel we should think through the alternatives. Would > it be possible to at least document in the design proposal why the > alternatives are not desirable, what are the pros and cons? > > -Jack > > > > > > > > > > > On Tue, Jul 16, 2024 at 5:47 AM Eduard Tudenhöfner < > etudenhoef...@apache.org> wrote: > >> Hey everyone, >> >> I've written up >> https://docs.google.com/document/d/1F1xh6SJhS-opgWRe1pPvWh01j8VHNHRocfCCFttKNf0/edit >> to >> provide an easier way of giving feedback to the proposal. >> Please take a look so that we can discuss how we'd like to handle the >> default fallback behavior (*tables* vs *everything that's currently in >> the spec*) when a newer client talks to an older server. >> >> >> Eduard >> >> On Mon, Jul 15, 2024 at 4:24 PM Dmitri Bourlatchkov >> <dmitri.bourlatch...@dremio.com.invalid> wrote: >> >>> So I would argue to define the current set of APIs and specs as the >>>> default if the `capabilities` field is missing. >>> >>> >>> There have been two sides to this in prior discussions. Having *tables* >>> as the default vs having what's *currently in the spec* as the default. >>> The argument for having *tables* as the default is because we can't >>> assume that every REST server out there already supports views. >>> >>> >>> Can we assume that a server that does not declare capabilities does NOT >>> implement views? IMHO, that assumption is too strong and will break use >>> cases when the client is upgraded, but the server is not. >>> >>> Before capabilities were introduced, clients used to work in a certain >>> way. I think when the client starts interpreting capabilities, but the >>> server does not declare the capabilities property at all, the client should >>> (by default) work the same way as when it did not expect capabilities to be >>> declared. >>> >>> >>> Hence we're opting for the middle ground with *tables* + having a >>> *configurable >>> fallback mechanism*. Servers that already support views can configure >>> their clients to default to *tables / views*, meaning that no >>> additional (manual) configuration from a client's perspective is required >>> to get table & view behavior. >>> >>> >>> Forcing a server upgrade when users just want to upgrade the client is >>> too much of a burden, I think. Servers and clients are often managed by >>> different groups of people. >>> >>> In the end, IIRC previous posts in this thread correctly, declaring >>> server capabilities is an optimization to allow more efficient / less >>> error-prone client operation. I do not think it should impose additional >>> functional / interoperability requirements on servers. >>> >>> Cheers, >>> Dmitri. >>> >>> On Mon, Jul 15, 2024 at 10:11 AM Eduard Tudenhöfner < >>> etudenhoef...@apache.org> wrote: >>> >>>> Current servers do not send a `capabilities` field at all. You're >>>>> suggesting to use a new `rest-default-capabilities` property to let newer >>>>> clients assume `1`. Once the table/view/etc-spec capabilities are needed, >>>>> those newer clients would assume table-spec v1. That's wrong IMO. >>>> >>>> >>>> That statement I mentioned only applies to the capabilities that are >>>> currently in the PR and not to *table-spec / view-spec*. >>>> >>>> >>>> I'm not a fan of a `rest-default-capabilities` property at all, because >>>>> every user has to configure it explicitly and correctly >>>>> >>>> >>>> As I mentioned, servers can configure this for *all* of their clients >>>> via the *config* endpoint, so clients wouldn't have to do this >>>> *manually*. >>>> >>>> >>>> So I would argue to define the current set of APIs and specs as the >>>>> default if the `capabilities` field is missing. >>>> >>>> >>>> There have been two sides to this in prior discussions. Having *tables* >>>> as the default vs having what's *currently in the spec* as the >>>> default. The argument for having *tables* as the default is because we >>>> can't assume that every REST server out there already supports views. >>>> >>>> Hence we're opting for the middle ground with *tables* + having a >>>> *configurable >>>> fallback mechanism*. Servers that already support views can configure >>>> their clients to default to *tables / views*, meaning that no >>>> additional (manual) configuration from a client's perspective is required >>>> to get table & view behavior. >>>> >>>> Eduard >>>> >>>> On Mon, Jul 15, 2024 at 3:00 PM Robert Stupp <sn...@snazy.de> wrote: >>>> >>>>> Sorry, I don't understand the two suggestions, especially when used in >>>>> combination. Current servers do not send a `capabilities` field at all. >>>>> You're suggesting to use a new `rest-default-capabilities` property to let >>>>> newer clients assume `1`. Once the table/view/etc-spec capabilities are >>>>> needed, those newer clients would assume table-spec v1. That's wrong IMO. >>>>> >>>>> I'm not a fan of a `rest-default-capabilities` property at all, >>>>> because every user has to configure it explicitly and correctly. I predict >>>>> quite some users not doing this or not doing it correctly, causing some >>>>> trouble that can be prevented. The way things are configured is already >>>>> quite complex, and yet adding another option adds more complexity to >>>>> Iceberg. So I would argue to define the current set of APIs and specs as >>>>> the default if the `capabilities` field is missing. >>>>> >>>>> Just because the *current* implementation doesn't use >>>>> table-spec/view-spec doesn't mean near future clients would need it - >>>>> table-spec v3 isn't that far away. And with new data types, view-spec v2 >>>>> isn't far away either. >>>>> >>>>> Adding table-spec + view-spec capabilities now saves a lot of >>>>> headaches for Iceberg users in the near future. >>>>> >>>>> >>>>> On 15.07.24 11:27, Eduard Tudenhöfner wrote: >>>>> >>>>> I would suggest adding *table-spec / view-spec / udf-spec *capabilities >>>>> later when new requirements/updates get added. The current implementation >>>>> wouldn't make any use of these capabilities, so I don't see a good enough >>>>> reason to add them at this point. >>>>> >>>>> The PR currently says: "tables -> default capability in case the >>>>>> `capabilities` property doesn't exist or is empty in the response" - >>>>>> meaning: the server would _only_ support tables. This phrase in the spec >>>>>> proposal effectively removes the view functionality from all currently >>>>>> existing Iceberg REST implementations. >>>>> >>>>> >>>>> This is why the configurable fallback mechanism was mentioned in the >>>>> Catalog sync, which can be realized with *r* >>>>> *est-default-capabilities=tables,views,abc,xyz* (all of them >>>>> defaulting to version 1). A server could send that property via the config >>>>> route without having clients to change anything. >>>>> >>>>> >>>>> On Mon, Jul 15, 2024 at 10:24 AM Robert Stupp <sn...@snazy.de> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I still have concerns regarding the missing table-spec/view-spec >>>>>> capabilities. Newer clients can send create/update requests with >>>>>> requirements/updates of newer Iceberg table/view/udf specs to a server >>>>>> that >>>>>> doesn't support those spec versions - the outcome is rather undefined. >>>>>> What >>>>>> should a server do? Ignore the unknown fields and requirement/update >>>>>> types >>>>>> and hence do what it's potentially _not_ supposed to do? Reply with a >>>>>> then >>>>>> ambiguous 501 (is it the endpoint that's not implemented or the request >>>>>> content not supported)? Similar, what if a server decides to not support >>>>>> for example table-spec v1 and just drop the manifest-file list in a table >>>>>> snapshot leading to data loss? >>>>>> >>>>>> IMO capabilities must contain the table/view/... spec versions >>>>>> supported by the server. >>>>>> >>>>>> There's also the concern about the behavior if the `capabilties` >>>>>> field is missing (see >>>>>> https://github.com/apache/iceberg/pull/9940/files#r1676113409, not >>>>>> sure why the comment thread's resolved). The PR currently says: "tables >>>>>> -> >>>>>> default capability in case the `capabilities` property doesn't exist or >>>>>> is >>>>>> empty in the response" - meaning: the server would _only_ support tables. >>>>>> This phrase in the spec proposal effectively removes the view >>>>>> functionality >>>>>> from all currently existing Iceberg REST implementations. >>>>>> >>>>>> >>>>>> On 11.07.24 08:42, Eduard Tudenhöfner wrote: >>>>>> >>>>>> Are there any other concerns with the proposal or should we start a >>>>>> VOTE thread? >>>>>> >>>>>> Eduard >>>>>> >>>>>> On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov >>>>>> <dmitri.bourlatch...@dremio.com.invalid> >>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>> >>>>>>> Re: remote signing, I agree that it does not look like a server >>>>>>>> capability that a client can / should discover. It is more like >>>>>>>> something >>>>>>>> that the server instructs / configures the client to do. >>>>>>> >>>>>>> >>>>>>> While a server can control this behavior and instruct the client to >>>>>>> use remote signing, technically nothing is preventing a client from >>>>>>> configuring s3.remote-signing-enabled=true. In such a case it seems >>>>>>> more appropriate to indicate that this capability isn't supported rather >>>>>>> than a generic 501, because not every server will support remote >>>>>>> signing. >>>>>>> >>>>>>> >>>>>>> Good point regarding clients taking initiative and using request >>>>>>> singing without an explicit server-provided config. It moves the client >>>>>>> operations into a mode where the server has more control (over having >>>>>>> longer term client-side credentials), so it looks like a reasonable >>>>>>> mode to >>>>>>> support from the security perspective. >>>>>>> >>>>>>> Let's keep that capability flag. >>>>>>> >>>>>>> Cheers, >>>>>>> Dmitri. >>>>>>> >>>>>>> On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner < >>>>>>> etudenhoef...@apache.org> wrote: >>>>>>> >>>>>>>> Hey everyone, >>>>>>>> >>>>>>>> I've added a few inline comments below. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Re: remote signing, I agree that it does not look like a server >>>>>>>>> capability that a client can / should discover. It is more like >>>>>>>>> something >>>>>>>>> that the server instructs / configures the client to do. >>>>>>>> >>>>>>>> >>>>>>>> While a server can control this behavior and instruct the client to >>>>>>>> use remote signing, technically nothing is preventing a client from >>>>>>>> configuring s3.remote-signing-enabled=true. In such a case it >>>>>>>> seems more appropriate to indicate that this capability isn't supported >>>>>>>> rather than a generic 501, because not every server will support remote >>>>>>>> signing. >>>>>>>> >>>>>>>> The *vended-credentials* capability on the other hand is more >>>>>>>> informative in its nature and a server indeed configures a client. I >>>>>>>> think >>>>>>>> that was also one of the reasons I removed this capability but added it >>>>>>>> later back due to a comment from Jack. >>>>>>>> >>>>>>>> I'm ok either way in terms of removing / keeping >>>>>>>> *vended-credentials* as a capability but given that we'd want to >>>>>>>> include *actionable* capabilities at this point, I'd just remove >>>>>>>> it (nothing is preventing us from adding it later if necessary). >>>>>>>> >>>>>>>> >>>>>>>> In that case, why do we need all these other capabilities like >>>>>>>>> tables, remote-signing, etc. in the first place? >>>>>>>> >>>>>>>> >>>>>>>> Given that capabilities also carry versioning information, clients >>>>>>>> can make more informed decisions on which endpoints to call. One could >>>>>>>> argue that generally throwing a 501 on everything that isn't supported >>>>>>>> might be sufficient, but that doesn't necessarily help a client in >>>>>>>> knowing >>>>>>>> which versions of a capability are safe to call/use. >>>>>>>> >>>>>>>> Regarding the control of client-side fallback behavior: >>>>>>>> I think the default fallback behavior should be *tables* (with >>>>>>>> version 1) with a property in the REST catalog that allows configuring >>>>>>>> this >>>>>>>> to e.g. *rest-default-capabilities=tables,views,abc,xyz* (all of >>>>>>>> them defaulting to version 1). >>>>>>>> >>>>>>>> >>>>>>>> Eduard >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jul 9, 2024 at 7:00 PM Jack Ye <yezhao...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Yes I agree that sounds like a valid use case. So the criteria so >>>>>>>>> far is that capabilities are used for: >>>>>>>>> - controlling client-side fallback behavior >>>>>>>>> - failing expensive operations early if we know it will eventually >>>>>>>>> fail due to missing capability >>>>>>>>> >>>>>>>>> Do we agree if this is the criteria we should use? What about the >>>>>>>>> other capabilities, namly tables, remote-signing, credential-vending? >>>>>>>>> >>>>>>>>> -Jack >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue >>>>>>>>> <b...@databricks.com.invalid> <b...@databricks.com.invalid> wrote: >>>>>>>>> >>>>>>>>>> > does it make a difference if I declare the capability or not? >>>>>>>>>> >>>>>>>>>> I think that it does in other cases. Multi-table commits, for >>>>>>>>>> example, are a building block for multi-statement transactions. If a >>>>>>>>>> service doesn't support multi-table commits then we ideally want >>>>>>>>>> clients to >>>>>>>>>> know that ahead of time so that they don't run a big transaction and >>>>>>>>>> then >>>>>>>>>> fail because the commit is not supported. >>>>>>>>>> >>>>>>>>>> On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov >>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> >>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> Re: remote signing, I agree that it does not look like a server >>>>>>>>>>> capability that a client can / should discover. It is more like >>>>>>>>>>> something >>>>>>>>>>> that the server instructs / configures the client to do. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Dmitri. >>>>>>>>>>> >>>>>>>>>>> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye <yezhao...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I was reconciling the discussion yesterday, one point that was >>>>>>>>>>>> interesting to me was that we agreed the purpose of these >>>>>>>>>>>> capabilities is >>>>>>>>>>>> to "control client-side fallback behavior", or at least the client >>>>>>>>>>>> should >>>>>>>>>>>> behave differently based on these capabilities. However, this >>>>>>>>>>>> seems to be >>>>>>>>>>>> only needed so far for views, or more specifically, for loadView >>>>>>>>>>>> API only >>>>>>>>>>>> because it impacts the fallback behavior to resolve the identifier >>>>>>>>>>>> as a >>>>>>>>>>>> table or not. >>>>>>>>>>>> >>>>>>>>>>>> For all the other capabilities listed, and even the other >>>>>>>>>>>> endpoints in view, because a server can decide to implement it >>>>>>>>>>>> partially >>>>>>>>>>>> anyway and just document the behavior, does it make a difference >>>>>>>>>>>> if I >>>>>>>>>>>> declare the capability or not? The client will not stop the >>>>>>>>>>>> request, the >>>>>>>>>>>> server will just error out if it is not supported. Maybe the error >>>>>>>>>>>> is not >>>>>>>>>>>> in the expected code or message, but it is still an error. In that >>>>>>>>>>>> case, >>>>>>>>>>>> why do we need all these other capabilities like tables, >>>>>>>>>>>> remote-signing, >>>>>>>>>>>> etc. in the first place? >>>>>>>>>>>> >>>>>>>>>>>> Maybe it is too extreme of a thought, but could anyone help >>>>>>>>>>>> describe how the other capabilities could be used beyond >>>>>>>>>>>> potentially >>>>>>>>>>>> returning an error earlier? >>>>>>>>>>>> >>>>>>>>>>>> -Jack >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov >>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> >>>>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Eduard, >>>>>>>>>>>>> >>>>>>>>>>>>> > I've also added the 501 error to the response of the >>>>>>>>>>>>> respective endpoints but worth mentioning that *HEAD* / *GET >>>>>>>>>>>>> *requests >>>>>>>>>>>>> must not return a 501 >>>>>>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> >>>>>>>>>>>>> (this >>>>>>>>>>>>> implies that the server impl would e.g. return a *404* in >>>>>>>>>>>>> such a case). >>>>>>>>>>>>> >>>>>>>>>>>>> My reading on the Mozilla page makes me think that it is >>>>>>>>>>>>> phrased too narrowly. Reading RFC 2616 [1] I believe that it does >>>>>>>>>>>>> not >>>>>>>>>>>>> preclude responding with 501 to GET and HEAD requests. I think it >>>>>>>>>>>>> means >>>>>>>>>>>>> that GET and HEAD methods must be supported by "general purpose" >>>>>>>>>>>>> servers. >>>>>>>>>>>>> The Iceberg REST server is not a general purpose server for >>>>>>>>>>>>> resources. So, >>>>>>>>>>>>> I think it should be fine to respond with 501 to unimplemented >>>>>>>>>>>>> endpoints. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Dmitri. >>>>>>>>>>>>> >>>>>>>>>>>>> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1 >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner < >>>>>>>>>>>>> etudenhoef...@apache.org> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hey everyone, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I watched the catalog sync recording today and updated the PR >>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/9940> to remove >>>>>>>>>>>>>> fine-grained capabilities like *register-table / >>>>>>>>>>>>>> table-metrics*. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The current capabilities (with versioning information) in the >>>>>>>>>>>>>> PR are: >>>>>>>>>>>>>> >>>>>>>>>>>>>> - tables >>>>>>>>>>>>>> - views >>>>>>>>>>>>>> - remote-signing >>>>>>>>>>>>>> - vended-credentials >>>>>>>>>>>>>> - multi-table-commit >>>>>>>>>>>>>> >>>>>>>>>>>>>> For servers that only *partially* implement endpoints under >>>>>>>>>>>>>> a capability the spec requires the server to throw a *501 >>>>>>>>>>>>>> Not Implemented*. I've also added the 501 error to the >>>>>>>>>>>>>> response of the respective endpoints but worth mentioning that >>>>>>>>>>>>>> *HEAD* / *GET *requests must not return a 501 >>>>>>>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> >>>>>>>>>>>>>> (this >>>>>>>>>>>>>> implies that the server impl would e.g. return a *404* in >>>>>>>>>>>>>> such a case). >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards >>>>>>>>>>>>>> Eduard >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré < >>>>>>>>>>>>>> j...@nanthrax.net> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Eduard, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It makes sense to return 501 for servers which don't >>>>>>>>>>>>>>> implement all >>>>>>>>>>>>>>> endpoints. It means that the server will at least have to >>>>>>>>>>>>>>> implement >>>>>>>>>>>>>>> empty endpoints if needed (that makes sense to me). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think we should focus on only "identified capabilities". I >>>>>>>>>>>>>>> think >>>>>>>>>>>>>>> that I proposed before that the capabilities can be >>>>>>>>>>>>>>> overridden/provided by server implementation. Else, I'm >>>>>>>>>>>>>>> afraid we >>>>>>>>>>>>>>> won't be flexible enough or always behind the implementation >>>>>>>>>>>>>>> (if an >>>>>>>>>>>>>>> implementation wants to add "my-foo-cap"). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>> JB >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner >>>>>>>>>>>>>>> <etudenhoef...@apache.org> wrote: >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > I have clarified the wording in #9940 around the >>>>>>>>>>>>>>> requirement on having to implement all endpoints under a >>>>>>>>>>>>>>> particular >>>>>>>>>>>>>>> capability. >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > For servers that only partially implement endpoints under >>>>>>>>>>>>>>> a capability the spec requires the server to throw a 501 Not >>>>>>>>>>>>>>> Implemented. >>>>>>>>>>>>>>> This was suggested by Jack and it seems reasonable to do that. >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > Regarding the inclusion of table-spec / view-spec as a >>>>>>>>>>>>>>> capability: I think this might make sense for the next >>>>>>>>>>>>>>> iteration of the >>>>>>>>>>>>>>> REST spec but as I mentioned earlier I don't see any clear >>>>>>>>>>>>>>> benefit for the >>>>>>>>>>>>>>> current REST spec as the client wouldn't do anything with that >>>>>>>>>>>>>>> information. >>>>>>>>>>>>>>> > If there is a clear benefit of having this, then this can >>>>>>>>>>>>>>> still be added later to the current REST spec but I believe we >>>>>>>>>>>>>>> should >>>>>>>>>>>>>>> rather have a few well-defined and actionable capabilities >>>>>>>>>>>>>>> rather than too >>>>>>>>>>>>>>> many. >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > Eduard >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu < >>>>>>>>>>>>>>> liurenjie2...@gmail.com> wrote: >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert, >>>>>>>>>>>>>>> how do you envision this to be used? >>>>>>>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there >>>>>>>>>>>>>>> are 2 approaches we can go with, taking CreateTable as an >>>>>>>>>>>>>>> example: >>>>>>>>>>>>>>> >>> (1) increment the related operation version, which means >>>>>>>>>>>>>>> that POST /v2/{prefix}/namespaces/{ns}/tables will be created >>>>>>>>>>>>>>> and allow >>>>>>>>>>>>>>> creating tables in the v3 version. >>>>>>>>>>>>>>> >>> (2) update the existing table metadata model to support >>>>>>>>>>>>>>> both v2 and v3 fields, and the server enforces the payload >>>>>>>>>>>>>>> differently >>>>>>>>>>>>>>> based on the TableMetadata.format-version field. If the server >>>>>>>>>>>>>>> does not >>>>>>>>>>>>>>> support v3, it can return unsupported at that time. >>>>>>>>>>>>>>> >>> Either way we go, the table-spec version does not need >>>>>>>>>>>>>>> to be a capability. (1) seems to be cleaner, but has some >>>>>>>>>>>>>>> overhead in >>>>>>>>>>>>>>> provisioning a new endpoint compared to (2). >>>>>>>>>>>>>>> >>> Do you see another way to do this leveraging the >>>>>>>>>>>>>>> table-spec version? >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> 2 is cleaner but maybe inconsistent with current >>>>>>>>>>>>>>> behavior, since /v1/tables operation supports both v1 and v3. >>>>>>>>>>>>>>> We should >>>>>>>>>>>>>>> only go to 2 only when we have incompatible fields/break >>>>>>>>>>>>>>> changes according >>>>>>>>>>>>>>> to discussion. >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> Generally I agree with adding table-spec into >>>>>>>>>>>>>>> capabilities. For example, we can expose this to user in api so >>>>>>>>>>>>>>> that user >>>>>>>>>>>>>>> could choose a supported table format version without throwing >>>>>>>>>>>>>>> exception. >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye < >>>>>>>>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert, >>>>>>>>>>>>>>> how do you envision this to be used? >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there >>>>>>>>>>>>>>> are 2 approaches we can go with, taking CreateTable as an >>>>>>>>>>>>>>> example: >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> (1) increment the related operation version, which means >>>>>>>>>>>>>>> that POST /v2/{prefix}/namespaces/{ns}/tables will be created >>>>>>>>>>>>>>> and allow >>>>>>>>>>>>>>> creating tables in the v3 version. >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> (2) update the existing table metadata model to support >>>>>>>>>>>>>>> both v2 and v3 fields, and the server enforces the payload >>>>>>>>>>>>>>> differently >>>>>>>>>>>>>>> based on the TableMetadata.format-version field. If the server >>>>>>>>>>>>>>> does not >>>>>>>>>>>>>>> support v3, it can return unsupported at that time. >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> Either way we go, the table-spec version does not need >>>>>>>>>>>>>>> to be a capability. (1) seems to be cleaner, but has some >>>>>>>>>>>>>>> overhead in >>>>>>>>>>>>>>> provisioning a new endpoint compared to (2). >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> Do you see another way to do this leveraging the >>>>>>>>>>>>>>> table-spec version? >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> -Jack >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner >>>>>>>>>>>>>>> <eduard.tudenhoef...@databricks.com.invalid> >>>>>>>>>>>>>>> <eduard.tudenhoef...@databricks.com.invalid> wrote: >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> I couldn't make it to the catalog sync meeting >>>>>>>>>>>>>>> yesterday but I watched the recording today (thanks for >>>>>>>>>>>>>>> providing that). >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>>> The missing piece is how (new, capabilities-aware) >>>>>>>>>>>>>>> clients handle the case when a service does _not_ return the >>>>>>>>>>>>>>> capabilities >>>>>>>>>>>>>>> field (absent). My proposal would be that a client should in >>>>>>>>>>>>>>> this case >>>>>>>>>>>>>>> assume that all _currently_ existing capabilities are supported. >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> - tables: [1] >>>>>>>>>>>>>>> >>>>> - views: [1] >>>>>>>>>>>>>>> >>>>> - remote-signing: [1] >>>>>>>>>>>>>>> >>>>> - multi-table-commit: [1] >>>>>>>>>>>>>>> >>>>> - register-table: [1] >>>>>>>>>>>>>>> >>>>> - table-metrics: [1] >>>>>>>>>>>>>>> >>>>> - table-spec: [1,2] >>>>>>>>>>>>>>> >>>>> - view-spec: [1,2] >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>>> >>>>>>>>>>>>>>> >>>> The one thing I would like to add here is that the >>>>>>>>>>>>>>> current PR uses the tables capability (as version 1) as the >>>>>>>>>>>>>>> default when a >>>>>>>>>>>>>>> server doesn't return capabilities but it might be also ok to >>>>>>>>>>>>>>> include views >>>>>>>>>>>>>>> (as version 1) because the current client impl has some code to >>>>>>>>>>>>>>> deal with >>>>>>>>>>>>>>> errors in case endpoints don't exist. >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> Unless we agree that the currently existing >>>>>>>>>>>>>>> functionality in the REST spec is the default behavior to be >>>>>>>>>>>>>>> assumed for >>>>>>>>>>>>>>> older server, I'm not sure about including remote-signing / >>>>>>>>>>>>>>> multi-table-commit / register-table / table-metrics as it has >>>>>>>>>>>>>>> been >>>>>>>>>>>>>>> indicated in earlier comments on the PR/ML that not every REST >>>>>>>>>>>>>>> server >>>>>>>>>>>>>>> supports these. >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> That being said, we should discuss whether we want the >>>>>>>>>>>>>>> default behavior (when an older server doesn't send back >>>>>>>>>>>>>>> capabilities) to be >>>>>>>>>>>>>>> >>>> a) tables (version 1) only >>>>>>>>>>>>>>> >>>> b) the currently existing functionality as defined in >>>>>>>>>>>>>>> the REST spec (as version 1) >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> On another note: Including table-spec / view-spec seems >>>>>>>>>>>>>>> to be more informative in its nature as I don't think a client >>>>>>>>>>>>>>> would act >>>>>>>>>>>>>>> differently right now when seeing these. >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>> Thanks >>>>>>>>>>>>>>> >>>> Eduard >>>>>>>>>>>>>>> >>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Ryan Blue >>>>>>>>>> Databricks >>>>>>>>>> >>>>>>>>> -- >>>>>> Robert Stupp >>>>>> @snazy >>>>>> >>>>>> -- >>>>> Robert Stupp >>>>> @snazy >>>>> >>>>>