Re: remote signing, I agree that it does not look like a server capability > that a client can / should discover. It is more like something that the > server instructs / configures the client to do.
While a server can control this behavior and instruct the client to use remote signing, technically nothing is preventing a client from configuring s3.remote-signing-enabled=true. In such a case it seems more appropriate to indicate that this capability isn't supported rather than a generic 501, because not every server will support remote signing. Good point regarding clients taking initiative and using request singing without an explicit server-provided config. It moves the client operations into a mode where the server has more control (over having longer term client-side credentials), so it looks like a reasonable mode to support from the security perspective. Let's keep that capability flag. Cheers, Dmitri. On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner <etudenhoef...@apache.org> wrote: > Hey everyone, > > I've added a few inline comments below. > > > >> Re: remote signing, I agree that it does not look like a server >> capability that a client can / should discover. It is more like something >> that the server instructs / configures the client to do. > > > While a server can control this behavior and instruct the client to use > remote signing, technically nothing is preventing a client from configuring > s3.remote-signing-enabled=true. In such a case it seems more > appropriate to indicate that this capability isn't supported rather than a > generic 501, because not every server will support remote signing. > > The *vended-credentials* capability on the other hand is more informative > in its nature and a server indeed configures a client. I think that was > also one of the reasons I removed this capability but added it later back > due to a comment from Jack. > > I'm ok either way in terms of removing / keeping *vended-credentials* as > a capability but given that we'd want to include *actionable* capabilities > at this point, I'd just remove it (nothing is preventing us from adding it > later if necessary). > > > In that case, why do we need all these other capabilities like tables, >> remote-signing, etc. in the first place? > > > Given that capabilities also carry versioning information, clients can > make more informed decisions on which endpoints to call. One could argue > that generally throwing a 501 on everything that isn't supported might be > sufficient, but that doesn't necessarily help a client in knowing which > versions of a capability are safe to call/use. > > Regarding the control of client-side fallback behavior: > I think the default fallback behavior should be *tables* (with version 1) > with a property in the REST catalog that allows configuring this to e.g. > *rest-default-capabilities=tables,views,abc,xyz* (all of them defaulting > to version 1). > > > Eduard > > > On Tue, Jul 9, 2024 at 7:00 PM Jack Ye <yezhao...@gmail.com> wrote: > >> Yes I agree that sounds like a valid use case. So the criteria so far is >> that capabilities are used for: >> - controlling client-side fallback behavior >> - failing expensive operations early if we know it will eventually fail >> due to missing capability >> >> Do we agree if this is the criteria we should use? What about the other >> capabilities, namly tables, remote-signing, credential-vending? >> >> -Jack >> >> >> On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue <b...@databricks.com.invalid> >> wrote: >> >>> > does it make a difference if I declare the capability or not? >>> >>> I think that it does in other cases. Multi-table commits, for example, >>> are a building block for multi-statement transactions. If a service doesn't >>> support multi-table commits then we ideally want clients to know that ahead >>> of time so that they don't run a big transaction and then fail because the >>> commit is not supported. >>> >>> On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov >>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>> >>>> Re: remote signing, I agree that it does not look like a server >>>> capability that a client can / should discover. It is more like something >>>> that the server instructs / configures the client to do. >>>> >>>> Cheers, >>>> Dmitri. >>>> >>>> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye <yezhao...@gmail.com> wrote: >>>> >>>>> I was reconciling the discussion yesterday, one point that was >>>>> interesting to me was that we agreed the purpose of these capabilities is >>>>> to "control client-side fallback behavior", or at least the client should >>>>> behave differently based on these capabilities. However, this seems to be >>>>> only needed so far for views, or more specifically, for loadView API only >>>>> because it impacts the fallback behavior to resolve the identifier as a >>>>> table or not. >>>>> >>>>> For all the other capabilities listed, and even the other endpoints in >>>>> view, because a server can decide to implement it partially anyway and >>>>> just >>>>> document the behavior, does it make a difference if I declare the >>>>> capability or not? The client will not stop the request, the server will >>>>> just error out if it is not supported. Maybe the error is not in the >>>>> expected code or message, but it is still an error. In that case, why do >>>>> we >>>>> need all these other capabilities like tables, remote-signing, etc. in the >>>>> first place? >>>>> >>>>> Maybe it is too extreme of a thought, but could anyone help describe >>>>> how the other capabilities could be used beyond potentially returning an >>>>> error earlier? >>>>> >>>>> -Jack >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov >>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>> >>>>>> Hi Eduard, >>>>>> >>>>>> > I've also added the 501 error to the response of the respective >>>>>> endpoints but worth mentioning that *HEAD* / *GET *requests must not >>>>>> return a 501 >>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this >>>>>> implies that the server impl would e.g. return a *404* in such a >>>>>> case). >>>>>> >>>>>> My reading on the Mozilla page makes me think that it is phrased too >>>>>> narrowly. Reading RFC 2616 [1] I believe that it does not preclude >>>>>> responding with 501 to GET and HEAD requests. I think it means that GET >>>>>> and >>>>>> HEAD methods must be supported by "general purpose" servers. The Iceberg >>>>>> REST server is not a general purpose server for resources. So, I think it >>>>>> should be fine to respond with 501 to unimplemented endpoints. >>>>>> >>>>>> Cheers, >>>>>> Dmitri. >>>>>> >>>>>> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1 >>>>>> >>>>>> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner < >>>>>> etudenhoef...@apache.org> wrote: >>>>>> >>>>>>> Hey everyone, >>>>>>> >>>>>>> I watched the catalog sync recording today and updated the PR >>>>>>> <https://github.com/apache/iceberg/pull/9940> to remove >>>>>>> fine-grained capabilities like *register-table / table-metrics*. >>>>>>> >>>>>>> The current capabilities (with versioning information) in the PR are: >>>>>>> >>>>>>> - tables >>>>>>> - views >>>>>>> - remote-signing >>>>>>> - vended-credentials >>>>>>> - multi-table-commit >>>>>>> >>>>>>> For servers that only *partially* implement endpoints under a >>>>>>> capability the spec requires the server to throw a *501 Not >>>>>>> Implemented*. I've also added the 501 error to the response of the >>>>>>> respective endpoints but worth mentioning that *HEAD* / *GET *requests >>>>>>> must not return a 501 >>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this >>>>>>> implies that the server impl would e.g. return a *404* in such a >>>>>>> case). >>>>>>> >>>>>>> >>>>>>> Regards >>>>>>> Eduard >>>>>>> >>>>>>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré <j...@nanthrax.net> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Eduard, >>>>>>>> >>>>>>>> It makes sense to return 501 for servers which don't implement all >>>>>>>> endpoints. It means that the server will at least have to implement >>>>>>>> empty endpoints if needed (that makes sense to me). >>>>>>>> >>>>>>>> I think we should focus on only "identified capabilities". I think >>>>>>>> that I proposed before that the capabilities can be >>>>>>>> overridden/provided by server implementation. Else, I'm afraid we >>>>>>>> won't be flexible enough or always behind the implementation (if an >>>>>>>> implementation wants to add "my-foo-cap"). >>>>>>>> >>>>>>>> Regards >>>>>>>> JB >>>>>>>> >>>>>>>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner >>>>>>>> <etudenhoef...@apache.org> wrote: >>>>>>>> > >>>>>>>> > I have clarified the wording in #9940 around the requirement on >>>>>>>> having to implement all endpoints under a particular capability. >>>>>>>> > >>>>>>>> > For servers that only partially implement endpoints under a >>>>>>>> capability the spec requires the server to throw a 501 Not Implemented. >>>>>>>> This was suggested by Jack and it seems reasonable to do that. >>>>>>>> > >>>>>>>> > Regarding the inclusion of table-spec / view-spec as a >>>>>>>> capability: I think this might make sense for the next iteration of the >>>>>>>> REST spec but as I mentioned earlier I don't see any clear benefit for >>>>>>>> the >>>>>>>> current REST spec as the client wouldn't do anything with that >>>>>>>> information. >>>>>>>> > If there is a clear benefit of having this, then this can still >>>>>>>> be added later to the current REST spec but I believe we should rather >>>>>>>> have >>>>>>>> a few well-defined and actionable capabilities rather than too many. >>>>>>>> > >>>>>>>> > Eduard >>>>>>>> > >>>>>>>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu < >>>>>>>> liurenjie2...@gmail.com> wrote: >>>>>>>> >>> >>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert, how do >>>>>>>> you envision this to be used? >>>>>>>> >>> In my mind, if a new table format v3 is launched, there are 2 >>>>>>>> approaches we can go with, taking CreateTable as an example: >>>>>>>> >>> (1) increment the related operation version, which means that >>>>>>>> POST /v2/{prefix}/namespaces/{ns}/tables will be created and allow >>>>>>>> creating >>>>>>>> tables in the v3 version. >>>>>>>> >>> (2) update the existing table metadata model to support both v2 >>>>>>>> and v3 fields, and the server enforces the payload differently based >>>>>>>> on the >>>>>>>> TableMetadata.format-version field. If the server does not support v3, >>>>>>>> it >>>>>>>> can return unsupported at that time. >>>>>>>> >>> Either way we go, the table-spec version does not need to be a >>>>>>>> capability. (1) seems to be cleaner, but has some overhead in >>>>>>>> provisioning >>>>>>>> a new endpoint compared to (2). >>>>>>>> >>> Do you see another way to do this leveraging the table-spec >>>>>>>> version? >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> 2 is cleaner but maybe inconsistent with current behavior, since >>>>>>>> /v1/tables operation supports both v1 and v3. We should only go to 2 >>>>>>>> only >>>>>>>> when we have incompatible fields/break changes according to discussion. >>>>>>>> >> >>>>>>>> >> Generally I agree with adding table-spec into capabilities. For >>>>>>>> example, we can expose this to user in api so that user could choose a >>>>>>>> supported table format version without throwing exception. >>>>>>>> >> >>>>>>>> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye <yezhao...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>> >>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert, how do >>>>>>>> you envision this to be used? >>>>>>>> >>> >>>>>>>> >>> In my mind, if a new table format v3 is launched, there are 2 >>>>>>>> approaches we can go with, taking CreateTable as an example: >>>>>>>> >>> >>>>>>>> >>> (1) increment the related operation version, which means that >>>>>>>> POST /v2/{prefix}/namespaces/{ns}/tables will be created and allow >>>>>>>> creating >>>>>>>> tables in the v3 version. >>>>>>>> >>> >>>>>>>> >>> (2) update the existing table metadata model to support both v2 >>>>>>>> and v3 fields, and the server enforces the payload differently based >>>>>>>> on the >>>>>>>> TableMetadata.format-version field. If the server does not support v3, >>>>>>>> it >>>>>>>> can return unsupported at that time. >>>>>>>> >>> >>>>>>>> >>> Either way we go, the table-spec version does not need to be a >>>>>>>> capability. (1) seems to be cleaner, but has some overhead in >>>>>>>> provisioning >>>>>>>> a new endpoint compared to (2). >>>>>>>> >>> >>>>>>>> >>> Do you see another way to do this leveraging the table-spec >>>>>>>> version? >>>>>>>> >>> >>>>>>>> >>> -Jack >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner >>>>>>>> <eduard.tudenhoef...@databricks.com.invalid> wrote: >>>>>>>> >>>> >>>>>>>> >>>> >>>>>>>> >>>> I couldn't make it to the catalog sync meeting yesterday but I >>>>>>>> watched the recording today (thanks for providing that). >>>>>>>> >>>> >>>>>>>> >>>>> The missing piece is how (new, capabilities-aware) clients >>>>>>>> handle the case when a service does _not_ return the capabilities field >>>>>>>> (absent). My proposal would be that a client should in this case assume >>>>>>>> that all _currently_ existing capabilities are supported. >>>>>>>> >>>>> >>>>>>>> >>>>> - tables: [1] >>>>>>>> >>>>> - views: [1] >>>>>>>> >>>>> - remote-signing: [1] >>>>>>>> >>>>> - multi-table-commit: [1] >>>>>>>> >>>>> - register-table: [1] >>>>>>>> >>>>> - table-metrics: [1] >>>>>>>> >>>>> - table-spec: [1,2] >>>>>>>> >>>>> - view-spec: [1,2] >>>>>>>> >>>>> >>>>>>>> >>>>> >>>>>>>> >>>> The one thing I would like to add here is that the current PR >>>>>>>> uses the tables capability (as version 1) as the default when a server >>>>>>>> doesn't return capabilities but it might be also ok to include views >>>>>>>> (as >>>>>>>> version 1) because the current client impl has some code to deal with >>>>>>>> errors in case endpoints don't exist. >>>>>>>> >>>> >>>>>>>> >>>> Unless we agree that the currently existing functionality in >>>>>>>> the REST spec is the default behavior to be assumed for older server, >>>>>>>> I'm >>>>>>>> not sure about including remote-signing / multi-table-commit / >>>>>>>> register-table / table-metrics as it has been indicated in earlier >>>>>>>> comments >>>>>>>> on the PR/ML that not every REST server supports these. >>>>>>>> >>>> >>>>>>>> >>>> That being said, we should discuss whether we want the default >>>>>>>> behavior (when an older server doesn't send back capabilities) to be >>>>>>>> >>>> a) tables (version 1) only >>>>>>>> >>>> b) the currently existing functionality as defined in the REST >>>>>>>> spec (as version 1) >>>>>>>> >>>> >>>>>>>> >>>> >>>>>>>> >>>> On another note: Including table-spec / view-spec seems to be >>>>>>>> more informative in its nature as I don't think a client would act >>>>>>>> differently right now when seeing these. >>>>>>>> >>>> >>>>>>>> >>>> Thanks >>>>>>>> >>>> Eduard >>>>>>>> >>>> >>>>>>>> >>>>>>> >>> >>> -- >>> Ryan Blue >>> Databricks >>> >>