> > Current servers do not send a `capabilities` field at all. You're > suggesting to use a new `rest-default-capabilities` property to let newer > clients assume `1`. Once the table/view/etc-spec capabilities are needed, > those newer clients would assume table-spec v1. That's wrong IMO.
That statement I mentioned only applies to the capabilities that are currently in the PR and not to *table-spec / view-spec*. I'm not a fan of a `rest-default-capabilities` property at all, because > every user has to configure it explicitly and correctly > As I mentioned, servers can configure this for *all* of their clients via the *config* endpoint, so clients wouldn't have to do this *manually*. So I would argue to define the current set of APIs and specs as the default > if the `capabilities` field is missing. There have been two sides to this in prior discussions. Having *tables* as the default vs having what's *currently in the spec* as the default. The argument for having *tables* as the default is because we can't assume that every REST server out there already supports views. Hence we're opting for the middle ground with *tables* + having a *configurable fallback mechanism*. Servers that already support views can configure their clients to default to *tables / views*, meaning that no additional (manual) configuration from a client's perspective is required to get table & view behavior. Eduard On Mon, Jul 15, 2024 at 3:00 PM Robert Stupp <sn...@snazy.de> wrote: > Sorry, I don't understand the two suggestions, especially when used in > combination. Current servers do not send a `capabilities` field at all. > You're suggesting to use a new `rest-default-capabilities` property to let > newer clients assume `1`. Once the table/view/etc-spec capabilities are > needed, those newer clients would assume table-spec v1. That's wrong IMO. > > I'm not a fan of a `rest-default-capabilities` property at all, because > every user has to configure it explicitly and correctly. I predict quite > some users not doing this or not doing it correctly, causing some trouble > that can be prevented. The way things are configured is already quite > complex, and yet adding another option adds more complexity to Iceberg. So > I would argue to define the current set of APIs and specs as the default if > the `capabilities` field is missing. > > Just because the *current* implementation doesn't use table-spec/view-spec > doesn't mean near future clients would need it - table-spec v3 isn't that > far away. And with new data types, view-spec v2 isn't far away either. > > Adding table-spec + view-spec capabilities now saves a lot of headaches > for Iceberg users in the near future. > > > On 15.07.24 11:27, Eduard Tudenhöfner wrote: > > I would suggest adding *table-spec / view-spec / udf-spec *capabilities > later when new requirements/updates get added. The current implementation > wouldn't make any use of these capabilities, so I don't see a good enough > reason to add them at this point. > > The PR currently says: "tables -> default capability in case the >> `capabilities` property doesn't exist or is empty in the response" - >> meaning: the server would _only_ support tables. This phrase in the spec >> proposal effectively removes the view functionality from all currently >> existing Iceberg REST implementations. > > > This is why the configurable fallback mechanism was mentioned in the > Catalog sync, which can be realized with *r* > *est-default-capabilities=tables,views,abc,xyz* (all of them defaulting > to version 1). A server could send that property via the config route > without having clients to change anything. > > > On Mon, Jul 15, 2024 at 10:24 AM Robert Stupp <sn...@snazy.de> wrote: > >> Hi, >> >> I still have concerns regarding the missing table-spec/view-spec >> capabilities. Newer clients can send create/update requests with >> requirements/updates of newer Iceberg table/view/udf specs to a server that >> doesn't support those spec versions - the outcome is rather undefined. What >> should a server do? Ignore the unknown fields and requirement/update types >> and hence do what it's potentially _not_ supposed to do? Reply with a then >> ambiguous 501 (is it the endpoint that's not implemented or the request >> content not supported)? Similar, what if a server decides to not support >> for example table-spec v1 and just drop the manifest-file list in a table >> snapshot leading to data loss? >> >> IMO capabilities must contain the table/view/... spec versions supported >> by the server. >> >> There's also the concern about the behavior if the `capabilties` field is >> missing (see >> https://github.com/apache/iceberg/pull/9940/files#r1676113409, not sure >> why the comment thread's resolved). The PR currently says: "tables -> >> default capability in case the `capabilities` property doesn't exist or is >> empty in the response" - meaning: the server would _only_ support tables. >> This phrase in the spec proposal effectively removes the view functionality >> from all currently existing Iceberg REST implementations. >> >> >> On 11.07.24 08:42, Eduard Tudenhöfner wrote: >> >> Are there any other concerns with the proposal or should we start a VOTE >> thread? >> >> Eduard >> >> On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov >> <dmitri.bourlatch...@dremio.com.invalid> >> <dmitri.bourlatch...@dremio.com.invalid> wrote: >> >>> Re: remote signing, I agree that it does not look like a server >>>> capability that a client can / should discover. It is more like something >>>> that the server instructs / configures the client to do. >>> >>> >>> While a server can control this behavior and instruct the client to use >>> remote signing, technically nothing is preventing a client from configuring >>> s3.remote-signing-enabled=true. In such a case it seems more >>> appropriate to indicate that this capability isn't supported rather than a >>> generic 501, because not every server will support remote signing. >>> >>> >>> Good point regarding clients taking initiative and using request singing >>> without an explicit server-provided config. It moves the client operations >>> into a mode where the server has more control (over having longer term >>> client-side credentials), so it looks like a reasonable mode to support >>> from the security perspective. >>> >>> Let's keep that capability flag. >>> >>> Cheers, >>> Dmitri. >>> >>> On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner < >>> etudenhoef...@apache.org> wrote: >>> >>>> Hey everyone, >>>> >>>> I've added a few inline comments below. >>>> >>>> >>>> >>>>> Re: remote signing, I agree that it does not look like a server >>>>> capability that a client can / should discover. It is more like something >>>>> that the server instructs / configures the client to do. >>>> >>>> >>>> While a server can control this behavior and instruct the client to use >>>> remote signing, technically nothing is preventing a client from configuring >>>> s3.remote-signing-enabled=true. In such a case it seems more >>>> appropriate to indicate that this capability isn't supported rather than a >>>> generic 501, because not every server will support remote signing. >>>> >>>> The *vended-credentials* capability on the other hand is more >>>> informative in its nature and a server indeed configures a client. I think >>>> that was also one of the reasons I removed this capability but added it >>>> later back due to a comment from Jack. >>>> >>>> I'm ok either way in terms of removing / keeping *vended-credentials* >>>> as a capability but given that we'd want to include *actionable* >>>> capabilities >>>> at this point, I'd just remove it (nothing is preventing us from adding it >>>> later if necessary). >>>> >>>> >>>> In that case, why do we need all these other capabilities like tables, >>>>> remote-signing, etc. in the first place? >>>> >>>> >>>> Given that capabilities also carry versioning information, clients can >>>> make more informed decisions on which endpoints to call. One could argue >>>> that generally throwing a 501 on everything that isn't supported might be >>>> sufficient, but that doesn't necessarily help a client in knowing which >>>> versions of a capability are safe to call/use. >>>> >>>> Regarding the control of client-side fallback behavior: >>>> I think the default fallback behavior should be *tables* (with version >>>> 1) with a property in the REST catalog that allows configuring this to e.g. >>>> *rest-default-capabilities=tables,views,abc,xyz* (all of them >>>> defaulting to version 1). >>>> >>>> >>>> Eduard >>>> >>>> >>>> On Tue, Jul 9, 2024 at 7:00 PM Jack Ye <yezhao...@gmail.com> wrote: >>>> >>>>> Yes I agree that sounds like a valid use case. So the criteria so far >>>>> is that capabilities are used for: >>>>> - controlling client-side fallback behavior >>>>> - failing expensive operations early if we know it will eventually >>>>> fail due to missing capability >>>>> >>>>> Do we agree if this is the criteria we should use? What about the >>>>> other capabilities, namly tables, remote-signing, credential-vending? >>>>> >>>>> -Jack >>>>> >>>>> >>>>> On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue <b...@databricks.com.invalid> >>>>> <b...@databricks.com.invalid> wrote: >>>>> >>>>>> > does it make a difference if I declare the capability or not? >>>>>> >>>>>> I think that it does in other cases. Multi-table commits, for >>>>>> example, are a building block for multi-statement transactions. If a >>>>>> service doesn't support multi-table commits then we ideally want clients >>>>>> to >>>>>> know that ahead of time so that they don't run a big transaction and then >>>>>> fail because the commit is not supported. >>>>>> >>>>>> On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov >>>>>> <dmitri.bourlatch...@dremio.com.invalid> >>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>> >>>>>>> Re: remote signing, I agree that it does not look like a server >>>>>>> capability that a client can / should discover. It is more like >>>>>>> something >>>>>>> that the server instructs / configures the client to do. >>>>>>> >>>>>>> Cheers, >>>>>>> Dmitri. >>>>>>> >>>>>>> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye <yezhao...@gmail.com> wrote: >>>>>>> >>>>>>>> I was reconciling the discussion yesterday, one point that was >>>>>>>> interesting to me was that we agreed the purpose of these capabilities >>>>>>>> is >>>>>>>> to "control client-side fallback behavior", or at least the client >>>>>>>> should >>>>>>>> behave differently based on these capabilities. However, this seems to >>>>>>>> be >>>>>>>> only needed so far for views, or more specifically, for loadView API >>>>>>>> only >>>>>>>> because it impacts the fallback behavior to resolve the identifier as a >>>>>>>> table or not. >>>>>>>> >>>>>>>> For all the other capabilities listed, and even the other endpoints >>>>>>>> in view, because a server can decide to implement it partially anyway >>>>>>>> and >>>>>>>> just document the behavior, does it make a difference if I declare the >>>>>>>> capability or not? The client will not stop the request, the server >>>>>>>> will >>>>>>>> just error out if it is not supported. Maybe the error is not in the >>>>>>>> expected code or message, but it is still an error. In that case, why >>>>>>>> do we >>>>>>>> need all these other capabilities like tables, remote-signing, etc. in >>>>>>>> the >>>>>>>> first place? >>>>>>>> >>>>>>>> Maybe it is too extreme of a thought, but could anyone help >>>>>>>> describe how the other capabilities could be used beyond potentially >>>>>>>> returning an error earlier? >>>>>>>> >>>>>>>> -Jack >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov >>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> >>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>>> >>>>>>>>> Hi Eduard, >>>>>>>>> >>>>>>>>> > I've also added the 501 error to the response of the respective >>>>>>>>> endpoints but worth mentioning that *HEAD* / *GET *requests must >>>>>>>>> not return a 501 >>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this >>>>>>>>> implies that the server impl would e.g. return a *404* in such a >>>>>>>>> case). >>>>>>>>> >>>>>>>>> My reading on the Mozilla page makes me think that it is phrased >>>>>>>>> too narrowly. Reading RFC 2616 [1] I believe that it does not preclude >>>>>>>>> responding with 501 to GET and HEAD requests. I think it means that >>>>>>>>> GET and >>>>>>>>> HEAD methods must be supported by "general purpose" servers. The >>>>>>>>> Iceberg >>>>>>>>> REST server is not a general purpose server for resources. So, I >>>>>>>>> think it >>>>>>>>> should be fine to respond with 501 to unimplemented endpoints. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Dmitri. >>>>>>>>> >>>>>>>>> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1 >>>>>>>>> >>>>>>>>> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner < >>>>>>>>> etudenhoef...@apache.org> wrote: >>>>>>>>> >>>>>>>>>> Hey everyone, >>>>>>>>>> >>>>>>>>>> I watched the catalog sync recording today and updated the PR >>>>>>>>>> <https://github.com/apache/iceberg/pull/9940> to remove >>>>>>>>>> fine-grained capabilities like *register-table / table-metrics*. >>>>>>>>>> >>>>>>>>>> The current capabilities (with versioning information) in the PR >>>>>>>>>> are: >>>>>>>>>> >>>>>>>>>> - tables >>>>>>>>>> - views >>>>>>>>>> - remote-signing >>>>>>>>>> - vended-credentials >>>>>>>>>> - multi-table-commit >>>>>>>>>> >>>>>>>>>> For servers that only *partially* implement endpoints under a >>>>>>>>>> capability the spec requires the server to throw a *501 Not >>>>>>>>>> Implemented*. I've also added the 501 error to the response of >>>>>>>>>> the respective endpoints but worth mentioning that *HEAD* / *GET >>>>>>>>>> *requests must not return a 501 >>>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this >>>>>>>>>> implies that the server impl would e.g. return a *404* in such a >>>>>>>>>> case). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> Eduard >>>>>>>>>> >>>>>>>>>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré < >>>>>>>>>> j...@nanthrax.net> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Eduard, >>>>>>>>>>> >>>>>>>>>>> It makes sense to return 501 for servers which don't implement >>>>>>>>>>> all >>>>>>>>>>> endpoints. It means that the server will at least have to >>>>>>>>>>> implement >>>>>>>>>>> empty endpoints if needed (that makes sense to me). >>>>>>>>>>> >>>>>>>>>>> I think we should focus on only "identified capabilities". I >>>>>>>>>>> think >>>>>>>>>>> that I proposed before that the capabilities can be >>>>>>>>>>> overridden/provided by server implementation. Else, I'm afraid we >>>>>>>>>>> won't be flexible enough or always behind the implementation (if >>>>>>>>>>> an >>>>>>>>>>> implementation wants to add "my-foo-cap"). >>>>>>>>>>> >>>>>>>>>>> Regards >>>>>>>>>>> JB >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner >>>>>>>>>>> <etudenhoef...@apache.org> wrote: >>>>>>>>>>> > >>>>>>>>>>> > I have clarified the wording in #9940 around the requirement >>>>>>>>>>> on having to implement all endpoints under a particular capability. >>>>>>>>>>> > >>>>>>>>>>> > For servers that only partially implement endpoints under a >>>>>>>>>>> capability the spec requires the server to throw a 501 Not >>>>>>>>>>> Implemented. >>>>>>>>>>> This was suggested by Jack and it seems reasonable to do that. >>>>>>>>>>> > >>>>>>>>>>> > Regarding the inclusion of table-spec / view-spec as a >>>>>>>>>>> capability: I think this might make sense for the next iteration of >>>>>>>>>>> the >>>>>>>>>>> REST spec but as I mentioned earlier I don't see any clear benefit >>>>>>>>>>> for the >>>>>>>>>>> current REST spec as the client wouldn't do anything with that >>>>>>>>>>> information. >>>>>>>>>>> > If there is a clear benefit of having this, then this can >>>>>>>>>>> still be added later to the current REST spec but I believe we >>>>>>>>>>> should >>>>>>>>>>> rather have a few well-defined and actionable capabilities rather >>>>>>>>>>> than too >>>>>>>>>>> many. >>>>>>>>>>> > >>>>>>>>>>> > Eduard >>>>>>>>>>> > >>>>>>>>>>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu < >>>>>>>>>>> liurenjie2...@gmail.com> wrote: >>>>>>>>>>> >>> >>>>>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert, how >>>>>>>>>>> do you envision this to be used? >>>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there are >>>>>>>>>>> 2 approaches we can go with, taking CreateTable as an example: >>>>>>>>>>> >>> (1) increment the related operation version, which means >>>>>>>>>>> that POST /v2/{prefix}/namespaces/{ns}/tables will be created and >>>>>>>>>>> allow >>>>>>>>>>> creating tables in the v3 version. >>>>>>>>>>> >>> (2) update the existing table metadata model to support both >>>>>>>>>>> v2 and v3 fields, and the server enforces the payload differently >>>>>>>>>>> based on >>>>>>>>>>> the TableMetadata.format-version field. If the server does not >>>>>>>>>>> support v3, >>>>>>>>>>> it can return unsupported at that time. >>>>>>>>>>> >>> Either way we go, the table-spec version does not need to be >>>>>>>>>>> a capability. (1) seems to be cleaner, but has some overhead in >>>>>>>>>>> provisioning a new endpoint compared to (2). >>>>>>>>>>> >>> Do you see another way to do this leveraging the table-spec >>>>>>>>>>> version? >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> 2 is cleaner but maybe inconsistent with current behavior, >>>>>>>>>>> since /v1/tables operation supports both v1 and v3. We should only >>>>>>>>>>> go to 2 >>>>>>>>>>> only when we have incompatible fields/break changes according to >>>>>>>>>>> discussion. >>>>>>>>>>> >> >>>>>>>>>>> >> Generally I agree with adding table-spec into capabilities. >>>>>>>>>>> For example, we can expose this to user in api so that user could >>>>>>>>>>> choose a >>>>>>>>>>> supported table format version without throwing exception. >>>>>>>>>>> >> >>>>>>>>>>> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye <yezhao...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>> >>>>>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert, how >>>>>>>>>>> do you envision this to be used? >>>>>>>>>>> >>> >>>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there are >>>>>>>>>>> 2 approaches we can go with, taking CreateTable as an example: >>>>>>>>>>> >>> >>>>>>>>>>> >>> (1) increment the related operation version, which means >>>>>>>>>>> that POST /v2/{prefix}/namespaces/{ns}/tables will be created and >>>>>>>>>>> allow >>>>>>>>>>> creating tables in the v3 version. >>>>>>>>>>> >>> >>>>>>>>>>> >>> (2) update the existing table metadata model to support both >>>>>>>>>>> v2 and v3 fields, and the server enforces the payload differently >>>>>>>>>>> based on >>>>>>>>>>> the TableMetadata.format-version field. If the server does not >>>>>>>>>>> support v3, >>>>>>>>>>> it can return unsupported at that time. >>>>>>>>>>> >>> >>>>>>>>>>> >>> Either way we go, the table-spec version does not need to be >>>>>>>>>>> a capability. (1) seems to be cleaner, but has some overhead in >>>>>>>>>>> provisioning a new endpoint compared to (2). >>>>>>>>>>> >>> >>>>>>>>>>> >>> Do you see another way to do this leveraging the table-spec >>>>>>>>>>> version? >>>>>>>>>>> >>> >>>>>>>>>>> >>> -Jack >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner >>>>>>>>>>> <eduard.tudenhoef...@databricks.com.invalid> >>>>>>>>>>> <eduard.tudenhoef...@databricks.com.invalid> wrote: >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> I couldn't make it to the catalog sync meeting yesterday >>>>>>>>>>> but I watched the recording today (thanks for providing that). >>>>>>>>>>> >>>> >>>>>>>>>>> >>>>> The missing piece is how (new, capabilities-aware) clients >>>>>>>>>>> handle the case when a service does _not_ return the capabilities >>>>>>>>>>> field >>>>>>>>>>> (absent). My proposal would be that a client should in this case >>>>>>>>>>> assume >>>>>>>>>>> that all _currently_ existing capabilities are supported. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> - tables: [1] >>>>>>>>>>> >>>>> - views: [1] >>>>>>>>>>> >>>>> - remote-signing: [1] >>>>>>>>>>> >>>>> - multi-table-commit: [1] >>>>>>>>>>> >>>>> - register-table: [1] >>>>>>>>>>> >>>>> - table-metrics: [1] >>>>>>>>>>> >>>>> - table-spec: [1,2] >>>>>>>>>>> >>>>> - view-spec: [1,2] >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>> The one thing I would like to add here is that the current >>>>>>>>>>> PR uses the tables capability (as version 1) as the default when a >>>>>>>>>>> server >>>>>>>>>>> doesn't return capabilities but it might be also ok to include >>>>>>>>>>> views (as >>>>>>>>>>> version 1) because the current client impl has some code to deal >>>>>>>>>>> with >>>>>>>>>>> errors in case endpoints don't exist. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> Unless we agree that the currently existing functionality >>>>>>>>>>> in the REST spec is the default behavior to be assumed for older >>>>>>>>>>> server, >>>>>>>>>>> I'm not sure about including remote-signing / multi-table-commit / >>>>>>>>>>> register-table / table-metrics as it has been indicated in earlier >>>>>>>>>>> comments >>>>>>>>>>> on the PR/ML that not every REST server supports these. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> That being said, we should discuss whether we want the >>>>>>>>>>> default behavior (when an older server doesn't send back >>>>>>>>>>> capabilities) to be >>>>>>>>>>> >>>> a) tables (version 1) only >>>>>>>>>>> >>>> b) the currently existing functionality as defined in the >>>>>>>>>>> REST spec (as version 1) >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> On another note: Including table-spec / view-spec seems to >>>>>>>>>>> be more informative in its nature as I don't think a client would >>>>>>>>>>> act >>>>>>>>>>> differently right now when seeing these. >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> Thanks >>>>>>>>>>> >>>> Eduard >>>>>>>>>>> >>>> >>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Databricks >>>>>> >>>>> -- >> Robert Stupp >> @snazy >> >> -- > Robert Stupp > @snazy > >