Re: [DISCUSS] Describing REST Server capabilities

Eduard Tudenhöfner Wed, 10 Jul 2024 23:42:52 -0700

Are there any other concerns with the proposal or should we start a VOTE
thread?


Eduard

On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
<dmitri.bourlatch...@dremio.com.invalid> wrote:

> Re: remote signing, I agree that it does not look like a server capability
>> that a client can / should discover. It is more like something that the
>> server instructs / configures the client to do.
>
>
> While a server can control this behavior and instruct the client to use
> remote signing, technically nothing is preventing a client from configuring
> s3.remote-signing-enabled=true. In such a case it seems more
> appropriate to indicate that this capability isn't supported rather than a
> generic 501, because not every server will support remote signing.
>
>
> Good point regarding clients taking initiative and using request singing
> without an explicit server-provided config. It moves the client operations
> into a mode where the server has more control (over having longer term
> client-side credentials), so it looks like a reasonable mode to support
> from the security perspective.
>
> Let's keep that capability flag.
>
> Cheers,
> Dmitri.
>
> On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner <
> etudenhoef...@apache.org> wrote:
>
>> Hey everyone,
>>
>> I've added a few inline comments below.
>>
>>
>>
>>> Re: remote signing, I agree that it does not look like a server
>>> capability that a client can / should discover. It is more like something
>>> that the server instructs / configures the client to do.
>>
>>
>> While a server can control this behavior and instruct the client to use
>> remote signing, technically nothing is preventing a client from configuring
>> s3.remote-signing-enabled=true. In such a case it seems more
>> appropriate to indicate that this capability isn't supported rather than a
>> generic 501, because not every server will support remote signing.
>>
>> The *vended-credentials* capability on the other hand is more
>> informative in its nature and a server indeed configures a client. I think
>> that was also one of the reasons I removed this capability but added it
>> later back due to a comment from Jack.
>>
>> I'm ok either way in terms of removing / keeping *vended-credentials* as
>> a capability but given that we'd want to include *actionable* capabilities
>> at this point, I'd just remove it (nothing is preventing us from adding it
>> later if necessary).
>>
>>
>> In that case, why do we need all these other capabilities like tables,
>>> remote-signing, etc. in the first place?
>>
>>
>> Given that capabilities also carry versioning information, clients can
>> make more informed decisions on which endpoints to call. One could argue
>> that generally throwing a 501 on everything that isn't supported might be
>> sufficient, but that doesn't necessarily help a client in knowing which
>> versions of a capability are safe to call/use.
>>
>> Regarding the control of client-side fallback behavior:
>> I think the default fallback behavior should be *tables* (with version
>> 1) with a property in the REST catalog that allows configuring this to e.g.
>> *rest-default-capabilities=tables,views,abc,xyz* (all of them defaulting
>> to version 1).
>>
>>
>> Eduard
>>
>>
>> On Tue, Jul 9, 2024 at 7:00 PM Jack Ye <yezhao...@gmail.com> wrote:
>>
>>> Yes I agree that sounds like a valid use case. So the criteria so far is
>>> that capabilities are used for:
>>> - controlling client-side fallback behavior
>>> - failing expensive operations early if we know it will eventually fail
>>> due to missing capability
>>>
>>> Do we agree if this is the criteria we should use? What about the other
>>> capabilities, namly tables, remote-signing, credential-vending?
>>>
>>> -Jack
>>>
>>>
>>> On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue <b...@databricks.com.invalid>
>>> wrote:
>>>
>>>> > does it make a difference if I declare the capability or not?
>>>>
>>>> I think that it does in other cases. Multi-table commits, for example,
>>>> are a building block for multi-statement transactions. If a service doesn't
>>>> support multi-table commits then we ideally want clients to know that ahead
>>>> of time so that they don't run a big transaction and then fail because the
>>>> commit is not supported.
>>>>
>>>> On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov
>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>
>>>>> Re: remote signing, I agree that it does not look like a server
>>>>> capability that a client can / should discover. It is more like something
>>>>> that the server instructs / configures the client to do.
>>>>>
>>>>> Cheers,
>>>>> Dmitri.
>>>>>
>>>>> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>>>
>>>>>> I was reconciling the discussion yesterday, one point that was
>>>>>> interesting to me was that we agreed the purpose of these capabilities is
>>>>>> to "control client-side fallback behavior", or at least the client should
>>>>>> behave differently based on these capabilities. However, this seems to be
>>>>>> only needed so far for views, or more specifically, for loadView API only
>>>>>> because it impacts the fallback behavior to resolve the identifier as a
>>>>>> table or not.
>>>>>>
>>>>>> For all the other capabilities listed, and even the other endpoints
>>>>>> in view, because a server can decide to implement it partially anyway and
>>>>>> just document the behavior, does it make a difference if I declare the
>>>>>> capability or not? The client will not stop the request, the server will
>>>>>> just error out if it is not supported. Maybe the error is not in the
>>>>>> expected code or message, but it is still an error. In that case, why do 
>>>>>> we
>>>>>> need all these other capabilities like tables, remote-signing, etc. in 
>>>>>> the
>>>>>> first place?
>>>>>>
>>>>>> Maybe it is too extreme of a thought, but could anyone help describe
>>>>>> how the other capabilities could be used beyond potentially returning an
>>>>>> error earlier?
>>>>>>
>>>>>> -Jack
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov
>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>
>>>>>>> Hi Eduard,
>>>>>>>
>>>>>>> > I've also added the 501 error to the response of the respective
>>>>>>> endpoints but worth mentioning that *HEAD* / *GET *requests must
>>>>>>> not return a 501
>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
>>>>>>> implies that the server impl would e.g. return a *404* in such a
>>>>>>> case).
>>>>>>>
>>>>>>> My reading on the Mozilla page makes me think that it is phrased too
>>>>>>> narrowly. Reading RFC 2616 [1] I believe that it does not preclude
>>>>>>> responding with 501 to GET and HEAD requests. I think it means that GET 
>>>>>>> and
>>>>>>> HEAD methods must be supported by "general purpose" servers. The Iceberg
>>>>>>> REST server is not a general purpose server for resources. So, I think 
>>>>>>> it
>>>>>>> should be fine to respond with 501 to unimplemented endpoints.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Dmitri.
>>>>>>>
>>>>>>> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1
>>>>>>>
>>>>>>> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner <
>>>>>>> etudenhoef...@apache.org> wrote:
>>>>>>>
>>>>>>>> Hey everyone,
>>>>>>>>
>>>>>>>> I watched the catalog sync recording today and updated the PR
>>>>>>>> <https://github.com/apache/iceberg/pull/9940> to remove
>>>>>>>> fine-grained capabilities like *register-table / table-metrics*.
>>>>>>>>
>>>>>>>> The current capabilities (with versioning information) in the PR
>>>>>>>> are:
>>>>>>>>
>>>>>>>>    - tables
>>>>>>>>    - views
>>>>>>>>    - remote-signing
>>>>>>>>    - vended-credentials
>>>>>>>>    - multi-table-commit
>>>>>>>>
>>>>>>>> For servers that only *partially* implement endpoints under a
>>>>>>>> capability the spec requires the server to throw a *501 Not
>>>>>>>> Implemented*. I've also added the 501 error to the response of the
>>>>>>>> respective endpoints but worth mentioning that *HEAD* / *GET *requests
>>>>>>>> must not return a 501
>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
>>>>>>>> implies that the server impl would e.g. return a *404* in such a
>>>>>>>> case).
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Eduard
>>>>>>>>
>>>>>>>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré <
>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>
>>>>>>>>> Hi Eduard,
>>>>>>>>>
>>>>>>>>> It makes sense to return 501 for servers which don't implement all
>>>>>>>>> endpoints. It means that the server will at least have to implement
>>>>>>>>> empty endpoints if needed (that makes sense to me).
>>>>>>>>>
>>>>>>>>> I think we should focus on only "identified capabilities". I think
>>>>>>>>> that I proposed before that the capabilities can be
>>>>>>>>> overridden/provided by server implementation. Else, I'm afraid we
>>>>>>>>> won't be flexible enough or always behind the implementation (if an
>>>>>>>>> implementation wants to add "my-foo-cap").
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> JB
>>>>>>>>>
>>>>>>>>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner
>>>>>>>>> <etudenhoef...@apache.org> wrote:
>>>>>>>>> >
>>>>>>>>> > I have clarified the wording in #9940 around the requirement on
>>>>>>>>> having to implement all endpoints under a particular capability.
>>>>>>>>> >
>>>>>>>>> > For servers that only partially implement endpoints under a
>>>>>>>>> capability the spec requires the server to throw a 501 Not 
>>>>>>>>> Implemented.
>>>>>>>>> This was suggested by Jack and it seems reasonable to do that.
>>>>>>>>> >
>>>>>>>>> > Regarding the inclusion of table-spec / view-spec as a
>>>>>>>>> capability: I think this might make sense for the next iteration of 
>>>>>>>>> the
>>>>>>>>> REST spec but as I mentioned earlier I don't see any clear benefit 
>>>>>>>>> for the
>>>>>>>>> current REST spec as the client wouldn't do anything with that 
>>>>>>>>> information.
>>>>>>>>> > If there is a clear benefit of having this, then this can still
>>>>>>>>> be added later to the current REST spec but I believe we should 
>>>>>>>>> rather have
>>>>>>>>> a few well-defined and actionable capabilities rather than too many.
>>>>>>>>> >
>>>>>>>>> > Eduard
>>>>>>>>> >
>>>>>>>>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu <
>>>>>>>>> liurenjie2...@gmail.com> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert, how
>>>>>>>>> do you envision this to be used?
>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there are 2
>>>>>>>>> approaches we can go with, taking CreateTable as an example:
>>>>>>>>> >>> (1) increment the related operation version, which means that
>>>>>>>>> POST /v2/{prefix}/namespaces/{ns}/tables will be created and allow 
>>>>>>>>> creating
>>>>>>>>> tables in the v3 version.
>>>>>>>>> >>> (2) update the existing table metadata model to support both
>>>>>>>>> v2 and v3 fields, and the server enforces the payload differently 
>>>>>>>>> based on
>>>>>>>>> the TableMetadata.format-version field. If the server does not 
>>>>>>>>> support v3,
>>>>>>>>> it can return unsupported at that time.
>>>>>>>>> >>> Either way we go, the table-spec version does not need to be a
>>>>>>>>> capability. (1) seems to be cleaner, but has some overhead in 
>>>>>>>>> provisioning
>>>>>>>>> a new endpoint compared to (2).
>>>>>>>>> >>> Do you see another way to do this leveraging the table-spec
>>>>>>>>> version?
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> 2 is cleaner but maybe inconsistent with current behavior,
>>>>>>>>> since /v1/tables operation supports both v1 and v3. We should only go 
>>>>>>>>> to 2
>>>>>>>>> only when we have incompatible fields/break changes according to 
>>>>>>>>> discussion.
>>>>>>>>> >>
>>>>>>>>> >> Generally I agree with adding table-spec into capabilities. For
>>>>>>>>> example, we can expose this to user in api so that user could choose a
>>>>>>>>> supported table format version without throwing exception.
>>>>>>>>> >>
>>>>>>>>> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye <yezhao...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert, how
>>>>>>>>> do you envision this to be used?
>>>>>>>>> >>>
>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there are 2
>>>>>>>>> approaches we can go with, taking CreateTable as an example:
>>>>>>>>> >>>
>>>>>>>>> >>> (1) increment the related operation version, which means that
>>>>>>>>> POST /v2/{prefix}/namespaces/{ns}/tables will be created and allow 
>>>>>>>>> creating
>>>>>>>>> tables in the v3 version.
>>>>>>>>> >>>
>>>>>>>>> >>> (2) update the existing table metadata model to support both
>>>>>>>>> v2 and v3 fields, and the server enforces the payload differently 
>>>>>>>>> based on
>>>>>>>>> the TableMetadata.format-version field. If the server does not 
>>>>>>>>> support v3,
>>>>>>>>> it can return unsupported at that time.
>>>>>>>>> >>>
>>>>>>>>> >>> Either way we go, the table-spec version does not need to be a
>>>>>>>>> capability. (1) seems to be cleaner, but has some overhead in 
>>>>>>>>> provisioning
>>>>>>>>> a new endpoint compared to (2).
>>>>>>>>> >>>
>>>>>>>>> >>> Do you see another way to do this leveraging the table-spec
>>>>>>>>> version?
>>>>>>>>> >>>
>>>>>>>>> >>> -Jack
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner
>>>>>>>>> <eduard.tudenhoef...@databricks.com.invalid> wrote:
>>>>>>>>> >>>>
>>>>>>>>> >>>>
>>>>>>>>> >>>> I couldn't make it to the catalog sync meeting yesterday but
>>>>>>>>> I watched the recording today (thanks for providing that).
>>>>>>>>> >>>>
>>>>>>>>> >>>>> The missing piece is how (new, capabilities-aware) clients
>>>>>>>>> handle the case when a service does _not_ return the capabilities 
>>>>>>>>> field
>>>>>>>>> (absent). My proposal would be that a client should in this case 
>>>>>>>>> assume
>>>>>>>>> that all _currently_ existing capabilities are supported.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> - tables: [1]
>>>>>>>>> >>>>> - views: [1]
>>>>>>>>> >>>>> - remote-signing: [1]
>>>>>>>>> >>>>> - multi-table-commit: [1]
>>>>>>>>> >>>>> - register-table: [1]
>>>>>>>>> >>>>> - table-metrics: [1]
>>>>>>>>> >>>>> - table-spec: [1,2]
>>>>>>>>> >>>>> - view-spec: [1,2]
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>> The one thing I would like to add here is that the current PR
>>>>>>>>> uses the tables capability (as version 1) as the default when a server
>>>>>>>>> doesn't return capabilities but it might be also ok to include views 
>>>>>>>>> (as
>>>>>>>>> version 1) because the current client impl has some code to deal with
>>>>>>>>> errors in case endpoints don't exist.
>>>>>>>>> >>>>
>>>>>>>>> >>>> Unless we agree that the currently existing functionality in
>>>>>>>>> the REST spec is the default behavior to be assumed for older server, 
>>>>>>>>> I'm
>>>>>>>>> not sure about including remote-signing / multi-table-commit /
>>>>>>>>> register-table / table-metrics as it has been indicated in earlier 
>>>>>>>>> comments
>>>>>>>>> on the PR/ML that not every REST server supports these.
>>>>>>>>> >>>>
>>>>>>>>> >>>> That being said, we should discuss whether we want the
>>>>>>>>> default behavior (when an older server doesn't send back 
>>>>>>>>> capabilities) to be
>>>>>>>>> >>>> a) tables (version 1) only
>>>>>>>>> >>>> b) the currently existing functionality as defined in the
>>>>>>>>> REST spec (as version 1)
>>>>>>>>> >>>>
>>>>>>>>> >>>>
>>>>>>>>> >>>> On another note: Including table-spec / view-spec seems to be
>>>>>>>>> more informative in its nature as I don't think a client would act
>>>>>>>>> differently right now when seeing these.
>>>>>>>>> >>>>
>>>>>>>>> >>>> Thanks
>>>>>>>>> >>>> Eduard
>>>>>>>>> >>>>
>>>>>>>>>
>>>>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Databricks
>>>>
>>>

Re: [DISCUSS] Describing REST Server capabilities

Reply via email to