Sorry I will take a look at the new comments later today. -Jack
On Wed, Jul 10, 2024, 11:42 PM Eduard Tudenhöfner <etudenhoef...@apache.org> wrote: > Are there any other concerns with the proposal or should we start a VOTE > thread? > > Eduard > > On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov > <dmitri.bourlatch...@dremio.com.invalid> wrote: > >> Re: remote signing, I agree that it does not look like a server >>> capability that a client can / should discover. It is more like something >>> that the server instructs / configures the client to do. >> >> >> While a server can control this behavior and instruct the client to use >> remote signing, technically nothing is preventing a client from configuring >> s3.remote-signing-enabled=true. In such a case it seems more >> appropriate to indicate that this capability isn't supported rather than a >> generic 501, because not every server will support remote signing. >> >> >> Good point regarding clients taking initiative and using request singing >> without an explicit server-provided config. It moves the client operations >> into a mode where the server has more control (over having longer term >> client-side credentials), so it looks like a reasonable mode to support >> from the security perspective. >> >> Let's keep that capability flag. >> >> Cheers, >> Dmitri. >> >> On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner < >> etudenhoef...@apache.org> wrote: >> >>> Hey everyone, >>> >>> I've added a few inline comments below. >>> >>> >>> >>>> Re: remote signing, I agree that it does not look like a server >>>> capability that a client can / should discover. It is more like something >>>> that the server instructs / configures the client to do. >>> >>> >>> While a server can control this behavior and instruct the client to use >>> remote signing, technically nothing is preventing a client from configuring >>> s3.remote-signing-enabled=true. In such a case it seems more >>> appropriate to indicate that this capability isn't supported rather than a >>> generic 501, because not every server will support remote signing. >>> >>> The *vended-credentials* capability on the other hand is more >>> informative in its nature and a server indeed configures a client. I think >>> that was also one of the reasons I removed this capability but added it >>> later back due to a comment from Jack. >>> >>> I'm ok either way in terms of removing / keeping *vended-credentials* >>> as a capability but given that we'd want to include *actionable* >>> capabilities >>> at this point, I'd just remove it (nothing is preventing us from adding it >>> later if necessary). >>> >>> >>> In that case, why do we need all these other capabilities like tables, >>>> remote-signing, etc. in the first place? >>> >>> >>> Given that capabilities also carry versioning information, clients can >>> make more informed decisions on which endpoints to call. One could argue >>> that generally throwing a 501 on everything that isn't supported might be >>> sufficient, but that doesn't necessarily help a client in knowing which >>> versions of a capability are safe to call/use. >>> >>> Regarding the control of client-side fallback behavior: >>> I think the default fallback behavior should be *tables* (with version >>> 1) with a property in the REST catalog that allows configuring this to e.g. >>> *rest-default-capabilities=tables,views,abc,xyz* (all of them >>> defaulting to version 1). >>> >>> >>> Eduard >>> >>> >>> On Tue, Jul 9, 2024 at 7:00 PM Jack Ye <yezhao...@gmail.com> wrote: >>> >>>> Yes I agree that sounds like a valid use case. So the criteria so far >>>> is that capabilities are used for: >>>> - controlling client-side fallback behavior >>>> - failing expensive operations early if we know it will eventually fail >>>> due to missing capability >>>> >>>> Do we agree if this is the criteria we should use? What about the other >>>> capabilities, namly tables, remote-signing, credential-vending? >>>> >>>> -Jack >>>> >>>> >>>> On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue <b...@databricks.com.invalid> >>>> wrote: >>>> >>>>> > does it make a difference if I declare the capability or not? >>>>> >>>>> I think that it does in other cases. Multi-table commits, for example, >>>>> are a building block for multi-statement transactions. If a service >>>>> doesn't >>>>> support multi-table commits then we ideally want clients to know that >>>>> ahead >>>>> of time so that they don't run a big transaction and then fail because the >>>>> commit is not supported. >>>>> >>>>> On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov >>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>> >>>>>> Re: remote signing, I agree that it does not look like a server >>>>>> capability that a client can / should discover. It is more like something >>>>>> that the server instructs / configures the client to do. >>>>>> >>>>>> Cheers, >>>>>> Dmitri. >>>>>> >>>>>> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye <yezhao...@gmail.com> wrote: >>>>>> >>>>>>> I was reconciling the discussion yesterday, one point that was >>>>>>> interesting to me was that we agreed the purpose of these capabilities >>>>>>> is >>>>>>> to "control client-side fallback behavior", or at least the client >>>>>>> should >>>>>>> behave differently based on these capabilities. However, this seems to >>>>>>> be >>>>>>> only needed so far for views, or more specifically, for loadView API >>>>>>> only >>>>>>> because it impacts the fallback behavior to resolve the identifier as a >>>>>>> table or not. >>>>>>> >>>>>>> For all the other capabilities listed, and even the other endpoints >>>>>>> in view, because a server can decide to implement it partially anyway >>>>>>> and >>>>>>> just document the behavior, does it make a difference if I declare the >>>>>>> capability or not? The client will not stop the request, the server will >>>>>>> just error out if it is not supported. Maybe the error is not in the >>>>>>> expected code or message, but it is still an error. In that case, why >>>>>>> do we >>>>>>> need all these other capabilities like tables, remote-signing, etc. in >>>>>>> the >>>>>>> first place? >>>>>>> >>>>>>> Maybe it is too extreme of a thought, but could anyone help describe >>>>>>> how the other capabilities could be used beyond potentially returning an >>>>>>> error earlier? >>>>>>> >>>>>>> -Jack >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov >>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote: >>>>>>> >>>>>>>> Hi Eduard, >>>>>>>> >>>>>>>> > I've also added the 501 error to the response of the respective >>>>>>>> endpoints but worth mentioning that *HEAD* / *GET *requests must >>>>>>>> not return a 501 >>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this >>>>>>>> implies that the server impl would e.g. return a *404* in such a >>>>>>>> case). >>>>>>>> >>>>>>>> My reading on the Mozilla page makes me think that it is phrased >>>>>>>> too narrowly. Reading RFC 2616 [1] I believe that it does not preclude >>>>>>>> responding with 501 to GET and HEAD requests. I think it means that >>>>>>>> GET and >>>>>>>> HEAD methods must be supported by "general purpose" servers. The >>>>>>>> Iceberg >>>>>>>> REST server is not a general purpose server for resources. So, I think >>>>>>>> it >>>>>>>> should be fine to respond with 501 to unimplemented endpoints. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Dmitri. >>>>>>>> >>>>>>>> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1 >>>>>>>> >>>>>>>> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner < >>>>>>>> etudenhoef...@apache.org> wrote: >>>>>>>> >>>>>>>>> Hey everyone, >>>>>>>>> >>>>>>>>> I watched the catalog sync recording today and updated the PR >>>>>>>>> <https://github.com/apache/iceberg/pull/9940> to remove >>>>>>>>> fine-grained capabilities like *register-table / table-metrics*. >>>>>>>>> >>>>>>>>> The current capabilities (with versioning information) in the PR >>>>>>>>> are: >>>>>>>>> >>>>>>>>> - tables >>>>>>>>> - views >>>>>>>>> - remote-signing >>>>>>>>> - vended-credentials >>>>>>>>> - multi-table-commit >>>>>>>>> >>>>>>>>> For servers that only *partially* implement endpoints under a >>>>>>>>> capability the spec requires the server to throw a *501 Not >>>>>>>>> Implemented*. I've also added the 501 error to the response of >>>>>>>>> the respective endpoints but worth mentioning that *HEAD* / *GET >>>>>>>>> *requests >>>>>>>>> must not return a 501 >>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this >>>>>>>>> implies that the server impl would e.g. return a *404* in such a >>>>>>>>> case). >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> Eduard >>>>>>>>> >>>>>>>>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré < >>>>>>>>> j...@nanthrax.net> wrote: >>>>>>>>> >>>>>>>>>> Hi Eduard, >>>>>>>>>> >>>>>>>>>> It makes sense to return 501 for servers which don't implement all >>>>>>>>>> endpoints. It means that the server will at least have to >>>>>>>>>> implement >>>>>>>>>> empty endpoints if needed (that makes sense to me). >>>>>>>>>> >>>>>>>>>> I think we should focus on only "identified capabilities". I think >>>>>>>>>> that I proposed before that the capabilities can be >>>>>>>>>> overridden/provided by server implementation. Else, I'm afraid we >>>>>>>>>> won't be flexible enough or always behind the implementation (if >>>>>>>>>> an >>>>>>>>>> implementation wants to add "my-foo-cap"). >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> JB >>>>>>>>>> >>>>>>>>>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner >>>>>>>>>> <etudenhoef...@apache.org> wrote: >>>>>>>>>> > >>>>>>>>>> > I have clarified the wording in #9940 around the requirement on >>>>>>>>>> having to implement all endpoints under a particular capability. >>>>>>>>>> > >>>>>>>>>> > For servers that only partially implement endpoints under a >>>>>>>>>> capability the spec requires the server to throw a 501 Not >>>>>>>>>> Implemented. >>>>>>>>>> This was suggested by Jack and it seems reasonable to do that. >>>>>>>>>> > >>>>>>>>>> > Regarding the inclusion of table-spec / view-spec as a >>>>>>>>>> capability: I think this might make sense for the next iteration of >>>>>>>>>> the >>>>>>>>>> REST spec but as I mentioned earlier I don't see any clear benefit >>>>>>>>>> for the >>>>>>>>>> current REST spec as the client wouldn't do anything with that >>>>>>>>>> information. >>>>>>>>>> > If there is a clear benefit of having this, then this can still >>>>>>>>>> be added later to the current REST spec but I believe we should >>>>>>>>>> rather have >>>>>>>>>> a few well-defined and actionable capabilities rather than too many. >>>>>>>>>> > >>>>>>>>>> > Eduard >>>>>>>>>> > >>>>>>>>>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu < >>>>>>>>>> liurenjie2...@gmail.com> wrote: >>>>>>>>>> >>> >>>>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert, how >>>>>>>>>> do you envision this to be used? >>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there are 2 >>>>>>>>>> approaches we can go with, taking CreateTable as an example: >>>>>>>>>> >>> (1) increment the related operation version, which means that >>>>>>>>>> POST /v2/{prefix}/namespaces/{ns}/tables will be created and allow >>>>>>>>>> creating >>>>>>>>>> tables in the v3 version. >>>>>>>>>> >>> (2) update the existing table metadata model to support both >>>>>>>>>> v2 and v3 fields, and the server enforces the payload differently >>>>>>>>>> based on >>>>>>>>>> the TableMetadata.format-version field. If the server does not >>>>>>>>>> support v3, >>>>>>>>>> it can return unsupported at that time. >>>>>>>>>> >>> Either way we go, the table-spec version does not need to be >>>>>>>>>> a capability. (1) seems to be cleaner, but has some overhead in >>>>>>>>>> provisioning a new endpoint compared to (2). >>>>>>>>>> >>> Do you see another way to do this leveraging the table-spec >>>>>>>>>> version? >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> 2 is cleaner but maybe inconsistent with current behavior, >>>>>>>>>> since /v1/tables operation supports both v1 and v3. We should only >>>>>>>>>> go to 2 >>>>>>>>>> only when we have incompatible fields/break changes according to >>>>>>>>>> discussion. >>>>>>>>>> >> >>>>>>>>>> >> Generally I agree with adding table-spec into capabilities. >>>>>>>>>> For example, we can expose this to user in api so that user could >>>>>>>>>> choose a >>>>>>>>>> supported table format version without throwing exception. >>>>>>>>>> >> >>>>>>>>>> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye <yezhao...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>> >>>>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert, how >>>>>>>>>> do you envision this to be used? >>>>>>>>>> >>> >>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there are 2 >>>>>>>>>> approaches we can go with, taking CreateTable as an example: >>>>>>>>>> >>> >>>>>>>>>> >>> (1) increment the related operation version, which means that >>>>>>>>>> POST /v2/{prefix}/namespaces/{ns}/tables will be created and allow >>>>>>>>>> creating >>>>>>>>>> tables in the v3 version. >>>>>>>>>> >>> >>>>>>>>>> >>> (2) update the existing table metadata model to support both >>>>>>>>>> v2 and v3 fields, and the server enforces the payload differently >>>>>>>>>> based on >>>>>>>>>> the TableMetadata.format-version field. If the server does not >>>>>>>>>> support v3, >>>>>>>>>> it can return unsupported at that time. >>>>>>>>>> >>> >>>>>>>>>> >>> Either way we go, the table-spec version does not need to be >>>>>>>>>> a capability. (1) seems to be cleaner, but has some overhead in >>>>>>>>>> provisioning a new endpoint compared to (2). >>>>>>>>>> >>> >>>>>>>>>> >>> Do you see another way to do this leveraging the table-spec >>>>>>>>>> version? >>>>>>>>>> >>> >>>>>>>>>> >>> -Jack >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner >>>>>>>>>> <eduard.tudenhoef...@databricks.com.invalid> wrote: >>>>>>>>>> >>>> >>>>>>>>>> >>>> >>>>>>>>>> >>>> I couldn't make it to the catalog sync meeting yesterday but >>>>>>>>>> I watched the recording today (thanks for providing that). >>>>>>>>>> >>>> >>>>>>>>>> >>>>> The missing piece is how (new, capabilities-aware) clients >>>>>>>>>> handle the case when a service does _not_ return the capabilities >>>>>>>>>> field >>>>>>>>>> (absent). My proposal would be that a client should in this case >>>>>>>>>> assume >>>>>>>>>> that all _currently_ existing capabilities are supported. >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> - tables: [1] >>>>>>>>>> >>>>> - views: [1] >>>>>>>>>> >>>>> - remote-signing: [1] >>>>>>>>>> >>>>> - multi-table-commit: [1] >>>>>>>>>> >>>>> - register-table: [1] >>>>>>>>>> >>>>> - table-metrics: [1] >>>>>>>>>> >>>>> - table-spec: [1,2] >>>>>>>>>> >>>>> - view-spec: [1,2] >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> >>>>>>>>>> >>>> The one thing I would like to add here is that the current >>>>>>>>>> PR uses the tables capability (as version 1) as the default when a >>>>>>>>>> server >>>>>>>>>> doesn't return capabilities but it might be also ok to include views >>>>>>>>>> (as >>>>>>>>>> version 1) because the current client impl has some code to deal with >>>>>>>>>> errors in case endpoints don't exist. >>>>>>>>>> >>>> >>>>>>>>>> >>>> Unless we agree that the currently existing functionality in >>>>>>>>>> the REST spec is the default behavior to be assumed for older >>>>>>>>>> server, I'm >>>>>>>>>> not sure about including remote-signing / multi-table-commit / >>>>>>>>>> register-table / table-metrics as it has been indicated in earlier >>>>>>>>>> comments >>>>>>>>>> on the PR/ML that not every REST server supports these. >>>>>>>>>> >>>> >>>>>>>>>> >>>> That being said, we should discuss whether we want the >>>>>>>>>> default behavior (when an older server doesn't send back >>>>>>>>>> capabilities) to be >>>>>>>>>> >>>> a) tables (version 1) only >>>>>>>>>> >>>> b) the currently existing functionality as defined in the >>>>>>>>>> REST spec (as version 1) >>>>>>>>>> >>>> >>>>>>>>>> >>>> >>>>>>>>>> >>>> On another note: Including table-spec / view-spec seems to >>>>>>>>>> be more informative in its nature as I don't think a client would act >>>>>>>>>> differently right now when seeing these. >>>>>>>>>> >>>> >>>>>>>>>> >>>> Thanks >>>>>>>>>> >>>> Eduard >>>>>>>>>> >>>> >>>>>>>>>> >>>>>>>>> >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Databricks >>>>> >>>>