Sorry I will take a look at the new comments later today.

-Jack

On Wed, Jul 10, 2024, 11:42 PM Eduard Tudenhöfner <etudenhoef...@apache.org>
wrote:

> Are there any other concerns with the proposal or should we start a VOTE
> thread?
>
> Eduard
>
> On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>
>> Re: remote signing, I agree that it does not look like a server
>>> capability that a client can / should discover. It is more like something
>>> that the server instructs / configures the client to do.
>>
>>
>> While a server can control this behavior and instruct the client to use
>> remote signing, technically nothing is preventing a client from configuring
>> s3.remote-signing-enabled=true. In such a case it seems more
>> appropriate to indicate that this capability isn't supported rather than a
>> generic 501, because not every server will support remote signing.
>>
>>
>> Good point regarding clients taking initiative and using request singing
>> without an explicit server-provided config. It moves the client operations
>> into a mode where the server has more control (over having longer term
>> client-side credentials), so it looks like a reasonable mode to support
>> from the security perspective.
>>
>> Let's keep that capability flag.
>>
>> Cheers,
>> Dmitri.
>>
>> On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner <
>> etudenhoef...@apache.org> wrote:
>>
>>> Hey everyone,
>>>
>>> I've added a few inline comments below.
>>>
>>>
>>>
>>>> Re: remote signing, I agree that it does not look like a server
>>>> capability that a client can / should discover. It is more like something
>>>> that the server instructs / configures the client to do.
>>>
>>>
>>> While a server can control this behavior and instruct the client to use
>>> remote signing, technically nothing is preventing a client from configuring
>>> s3.remote-signing-enabled=true. In such a case it seems more
>>> appropriate to indicate that this capability isn't supported rather than a
>>> generic 501, because not every server will support remote signing.
>>>
>>> The *vended-credentials* capability on the other hand is more
>>> informative in its nature and a server indeed configures a client. I think
>>> that was also one of the reasons I removed this capability but added it
>>> later back due to a comment from Jack.
>>>
>>> I'm ok either way in terms of removing / keeping *vended-credentials*
>>> as a capability but given that we'd want to include *actionable* 
>>> capabilities
>>> at this point, I'd just remove it (nothing is preventing us from adding it
>>> later if necessary).
>>>
>>>
>>> In that case, why do we need all these other capabilities like tables,
>>>> remote-signing, etc. in the first place?
>>>
>>>
>>> Given that capabilities also carry versioning information, clients can
>>> make more informed decisions on which endpoints to call. One could argue
>>> that generally throwing a 501 on everything that isn't supported might be
>>> sufficient, but that doesn't necessarily help a client in knowing which
>>> versions of a capability are safe to call/use.
>>>
>>> Regarding the control of client-side fallback behavior:
>>> I think the default fallback behavior should be *tables* (with version
>>> 1) with a property in the REST catalog that allows configuring this to e.g.
>>> *rest-default-capabilities=tables,views,abc,xyz* (all of them
>>> defaulting to version 1).
>>>
>>>
>>> Eduard
>>>
>>>
>>> On Tue, Jul 9, 2024 at 7:00 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>
>>>> Yes I agree that sounds like a valid use case. So the criteria so far
>>>> is that capabilities are used for:
>>>> - controlling client-side fallback behavior
>>>> - failing expensive operations early if we know it will eventually fail
>>>> due to missing capability
>>>>
>>>> Do we agree if this is the criteria we should use? What about the other
>>>> capabilities, namly tables, remote-signing, credential-vending?
>>>>
>>>> -Jack
>>>>
>>>>
>>>> On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue <b...@databricks.com.invalid>
>>>> wrote:
>>>>
>>>>> > does it make a difference if I declare the capability or not?
>>>>>
>>>>> I think that it does in other cases. Multi-table commits, for example,
>>>>> are a building block for multi-statement transactions. If a service 
>>>>> doesn't
>>>>> support multi-table commits then we ideally want clients to know that 
>>>>> ahead
>>>>> of time so that they don't run a big transaction and then fail because the
>>>>> commit is not supported.
>>>>>
>>>>> On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov
>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>
>>>>>> Re: remote signing, I agree that it does not look like a server
>>>>>> capability that a client can / should discover. It is more like something
>>>>>> that the server instructs / configures the client to do.
>>>>>>
>>>>>> Cheers,
>>>>>> Dmitri.
>>>>>>
>>>>>> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>>>>
>>>>>>> I was reconciling the discussion yesterday, one point that was
>>>>>>> interesting to me was that we agreed the purpose of these capabilities 
>>>>>>> is
>>>>>>> to "control client-side fallback behavior", or at least the client 
>>>>>>> should
>>>>>>> behave differently based on these capabilities. However, this seems to 
>>>>>>> be
>>>>>>> only needed so far for views, or more specifically, for loadView API 
>>>>>>> only
>>>>>>> because it impacts the fallback behavior to resolve the identifier as a
>>>>>>> table or not.
>>>>>>>
>>>>>>> For all the other capabilities listed, and even the other endpoints
>>>>>>> in view, because a server can decide to implement it partially anyway 
>>>>>>> and
>>>>>>> just document the behavior, does it make a difference if I declare the
>>>>>>> capability or not? The client will not stop the request, the server will
>>>>>>> just error out if it is not supported. Maybe the error is not in the
>>>>>>> expected code or message, but it is still an error. In that case, why 
>>>>>>> do we
>>>>>>> need all these other capabilities like tables, remote-signing, etc. in 
>>>>>>> the
>>>>>>> first place?
>>>>>>>
>>>>>>> Maybe it is too extreme of a thought, but could anyone help describe
>>>>>>> how the other capabilities could be used beyond potentially returning an
>>>>>>> error earlier?
>>>>>>>
>>>>>>> -Jack
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov
>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>>
>>>>>>>> Hi Eduard,
>>>>>>>>
>>>>>>>> > I've also added the 501 error to the response of the respective
>>>>>>>> endpoints but worth mentioning that *HEAD* / *GET *requests must
>>>>>>>> not return a 501
>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
>>>>>>>> implies that the server impl would e.g. return a *404* in such a
>>>>>>>> case).
>>>>>>>>
>>>>>>>> My reading on the Mozilla page makes me think that it is phrased
>>>>>>>> too narrowly. Reading RFC 2616 [1] I believe that it does not preclude
>>>>>>>> responding with 501 to GET and HEAD requests. I think it means that 
>>>>>>>> GET and
>>>>>>>> HEAD methods must be supported by "general purpose" servers. The 
>>>>>>>> Iceberg
>>>>>>>> REST server is not a general purpose server for resources. So, I think 
>>>>>>>> it
>>>>>>>> should be fine to respond with 501 to unimplemented endpoints.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Dmitri.
>>>>>>>>
>>>>>>>> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1
>>>>>>>>
>>>>>>>> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner <
>>>>>>>> etudenhoef...@apache.org> wrote:
>>>>>>>>
>>>>>>>>> Hey everyone,
>>>>>>>>>
>>>>>>>>> I watched the catalog sync recording today and updated the PR
>>>>>>>>> <https://github.com/apache/iceberg/pull/9940> to remove
>>>>>>>>> fine-grained capabilities like *register-table / table-metrics*.
>>>>>>>>>
>>>>>>>>> The current capabilities (with versioning information) in the PR
>>>>>>>>> are:
>>>>>>>>>
>>>>>>>>>    - tables
>>>>>>>>>    - views
>>>>>>>>>    - remote-signing
>>>>>>>>>    - vended-credentials
>>>>>>>>>    - multi-table-commit
>>>>>>>>>
>>>>>>>>> For servers that only *partially* implement endpoints under a
>>>>>>>>> capability the spec requires the server to throw a *501 Not
>>>>>>>>> Implemented*. I've also added the 501 error to the response of
>>>>>>>>> the respective endpoints but worth mentioning that *HEAD* / *GET 
>>>>>>>>> *requests
>>>>>>>>> must not return a 501
>>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
>>>>>>>>> implies that the server impl would e.g. return a *404* in such a
>>>>>>>>> case).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Eduard
>>>>>>>>>
>>>>>>>>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré <
>>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Eduard,
>>>>>>>>>>
>>>>>>>>>> It makes sense to return 501 for servers which don't implement all
>>>>>>>>>> endpoints. It means that the server will at least have to
>>>>>>>>>> implement
>>>>>>>>>> empty endpoints if needed (that makes sense to me).
>>>>>>>>>>
>>>>>>>>>> I think we should focus on only "identified capabilities". I think
>>>>>>>>>> that I proposed before that the capabilities can be
>>>>>>>>>> overridden/provided by server implementation. Else, I'm afraid we
>>>>>>>>>> won't be flexible enough or always behind the implementation (if
>>>>>>>>>> an
>>>>>>>>>> implementation wants to add "my-foo-cap").
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> JB
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner
>>>>>>>>>> <etudenhoef...@apache.org> wrote:
>>>>>>>>>> >
>>>>>>>>>> > I have clarified the wording in #9940 around the requirement on
>>>>>>>>>> having to implement all endpoints under a particular capability.
>>>>>>>>>> >
>>>>>>>>>> > For servers that only partially implement endpoints under a
>>>>>>>>>> capability the spec requires the server to throw a 501 Not 
>>>>>>>>>> Implemented.
>>>>>>>>>> This was suggested by Jack and it seems reasonable to do that.
>>>>>>>>>> >
>>>>>>>>>> > Regarding the inclusion of table-spec / view-spec as a
>>>>>>>>>> capability: I think this might make sense for the next iteration of 
>>>>>>>>>> the
>>>>>>>>>> REST spec but as I mentioned earlier I don't see any clear benefit 
>>>>>>>>>> for the
>>>>>>>>>> current REST spec as the client wouldn't do anything with that 
>>>>>>>>>> information.
>>>>>>>>>> > If there is a clear benefit of having this, then this can still
>>>>>>>>>> be added later to the current REST spec but I believe we should 
>>>>>>>>>> rather have
>>>>>>>>>> a few well-defined and actionable capabilities rather than too many.
>>>>>>>>>> >
>>>>>>>>>> > Eduard
>>>>>>>>>> >
>>>>>>>>>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu <
>>>>>>>>>> liurenjie2...@gmail.com> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert, how
>>>>>>>>>> do you envision this to be used?
>>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there are 2
>>>>>>>>>> approaches we can go with, taking CreateTable as an example:
>>>>>>>>>> >>> (1) increment the related operation version, which means that
>>>>>>>>>> POST /v2/{prefix}/namespaces/{ns}/tables will be created and allow 
>>>>>>>>>> creating
>>>>>>>>>> tables in the v3 version.
>>>>>>>>>> >>> (2) update the existing table metadata model to support both
>>>>>>>>>> v2 and v3 fields, and the server enforces the payload differently 
>>>>>>>>>> based on
>>>>>>>>>> the TableMetadata.format-version field. If the server does not 
>>>>>>>>>> support v3,
>>>>>>>>>> it can return unsupported at that time.
>>>>>>>>>> >>> Either way we go, the table-spec version does not need to be
>>>>>>>>>> a capability. (1) seems to be cleaner, but has some overhead in
>>>>>>>>>> provisioning a new endpoint compared to (2).
>>>>>>>>>> >>> Do you see another way to do this leveraging the table-spec
>>>>>>>>>> version?
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> 2 is cleaner but maybe inconsistent with current behavior,
>>>>>>>>>> since /v1/tables operation supports both v1 and v3. We should only 
>>>>>>>>>> go to 2
>>>>>>>>>> only when we have incompatible fields/break changes according to 
>>>>>>>>>> discussion.
>>>>>>>>>> >>
>>>>>>>>>> >> Generally I agree with adding table-spec into capabilities.
>>>>>>>>>> For example, we can expose this to user in api so that user could 
>>>>>>>>>> choose a
>>>>>>>>>> supported table format version without throwing exception.
>>>>>>>>>> >>
>>>>>>>>>> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye <yezhao...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert, how
>>>>>>>>>> do you envision this to be used?
>>>>>>>>>> >>>
>>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there are 2
>>>>>>>>>> approaches we can go with, taking CreateTable as an example:
>>>>>>>>>> >>>
>>>>>>>>>> >>> (1) increment the related operation version, which means that
>>>>>>>>>> POST /v2/{prefix}/namespaces/{ns}/tables will be created and allow 
>>>>>>>>>> creating
>>>>>>>>>> tables in the v3 version.
>>>>>>>>>> >>>
>>>>>>>>>> >>> (2) update the existing table metadata model to support both
>>>>>>>>>> v2 and v3 fields, and the server enforces the payload differently 
>>>>>>>>>> based on
>>>>>>>>>> the TableMetadata.format-version field. If the server does not 
>>>>>>>>>> support v3,
>>>>>>>>>> it can return unsupported at that time.
>>>>>>>>>> >>>
>>>>>>>>>> >>> Either way we go, the table-spec version does not need to be
>>>>>>>>>> a capability. (1) seems to be cleaner, but has some overhead in
>>>>>>>>>> provisioning a new endpoint compared to (2).
>>>>>>>>>> >>>
>>>>>>>>>> >>> Do you see another way to do this leveraging the table-spec
>>>>>>>>>> version?
>>>>>>>>>> >>>
>>>>>>>>>> >>> -Jack
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner
>>>>>>>>>> <eduard.tudenhoef...@databricks.com.invalid> wrote:
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> I couldn't make it to the catalog sync meeting yesterday but
>>>>>>>>>> I watched the recording today (thanks for providing that).
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>> The missing piece is how (new, capabilities-aware) clients
>>>>>>>>>> handle the case when a service does _not_ return the capabilities 
>>>>>>>>>> field
>>>>>>>>>> (absent). My proposal would be that a client should in this case 
>>>>>>>>>> assume
>>>>>>>>>> that all _currently_ existing capabilities are supported.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> - tables: [1]
>>>>>>>>>> >>>>> - views: [1]
>>>>>>>>>> >>>>> - remote-signing: [1]
>>>>>>>>>> >>>>> - multi-table-commit: [1]
>>>>>>>>>> >>>>> - register-table: [1]
>>>>>>>>>> >>>>> - table-metrics: [1]
>>>>>>>>>> >>>>> - table-spec: [1,2]
>>>>>>>>>> >>>>> - view-spec: [1,2]
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>> The one thing I would like to add here is that the current
>>>>>>>>>> PR uses the tables capability (as version 1) as the default when a 
>>>>>>>>>> server
>>>>>>>>>> doesn't return capabilities but it might be also ok to include views 
>>>>>>>>>> (as
>>>>>>>>>> version 1) because the current client impl has some code to deal with
>>>>>>>>>> errors in case endpoints don't exist.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> Unless we agree that the currently existing functionality in
>>>>>>>>>> the REST spec is the default behavior to be assumed for older 
>>>>>>>>>> server, I'm
>>>>>>>>>> not sure about including remote-signing / multi-table-commit /
>>>>>>>>>> register-table / table-metrics as it has been indicated in earlier 
>>>>>>>>>> comments
>>>>>>>>>> on the PR/ML that not every REST server supports these.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> That being said, we should discuss whether we want the
>>>>>>>>>> default behavior (when an older server doesn't send back 
>>>>>>>>>> capabilities) to be
>>>>>>>>>> >>>> a) tables (version 1) only
>>>>>>>>>> >>>> b) the currently existing functionality as defined in the
>>>>>>>>>> REST spec (as version 1)
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> On another note: Including table-spec / view-spec seems to
>>>>>>>>>> be more informative in its nature as I don't think a client would act
>>>>>>>>>> differently right now when seeing these.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> Thanks
>>>>>>>>>> >>>> Eduard
>>>>>>>>>> >>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Databricks
>>>>>
>>>>

Reply via email to