>
> Are there any other concerns with the proposal or should we start a VOTE
> thread?


We should summarize the consensus since the thread is quite long
and then check if anyone has additional points to add before we proceed to
voting.

- Ajantha



On Fri, Jul 12, 2024 at 9:09 AM Jack Ye <yezhao...@gmail.com> wrote:

> > While a server can control this behavior and instruct the client to use
> remote signing, technically nothing is preventing a client from configuring
> s3.remote-signing-enabled=true. In such a case it seems more
> appropriate to indicate that this capability isn't supported rather than a
> generic 501, because not every server will support remote signing.
>
> This is what I did not fully understand, because it seems like we are
> saying in addition to the criteria of using capability to:
> - controlling client-side fallback behavior
> - failing expensive operations early if we know it will eventually fail
> due to missing capability
>
> you also try to fine-tune any client side behavior to match server side
> capabilities so it fails early and more gracefully rather than just a
> generic 501. But why does this logic not apply to per-endpoint versioning?
> Isn't it also nice to just fail at client side instead of calling server
> and getting a "generic 501"?
>
> -Jack
>
>
>
>
>
>
>
> On Thu, Jul 11, 2024 at 9:51 AM Jack Ye <yezhao...@gmail.com> wrote:
>
>> Sorry I will take a look at the new comments later today.
>>
>> -Jack
>>
>> On Wed, Jul 10, 2024, 11:42 PM Eduard Tudenhöfner <
>> etudenhoef...@apache.org> wrote:
>>
>>> Are there any other concerns with the proposal or should we start a VOTE
>>> thread?
>>>
>>> Eduard
>>>
>>> On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>
>>>> Re: remote signing, I agree that it does not look like a server
>>>>> capability that a client can / should discover. It is more like something
>>>>> that the server instructs / configures the client to do.
>>>>
>>>>
>>>> While a server can control this behavior and instruct the client to use
>>>> remote signing, technically nothing is preventing a client from configuring
>>>> s3.remote-signing-enabled=true. In such a case it seems more
>>>> appropriate to indicate that this capability isn't supported rather than a
>>>> generic 501, because not every server will support remote signing.
>>>>
>>>>
>>>> Good point regarding clients taking initiative and using request
>>>> singing without an explicit server-provided config. It moves the client
>>>> operations into a mode where the server has more control (over having
>>>> longer term client-side credentials), so it looks like a reasonable mode to
>>>> support from the security perspective.
>>>>
>>>> Let's keep that capability flag.
>>>>
>>>> Cheers,
>>>> Dmitri.
>>>>
>>>> On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner <
>>>> etudenhoef...@apache.org> wrote:
>>>>
>>>>> Hey everyone,
>>>>>
>>>>> I've added a few inline comments below.
>>>>>
>>>>>
>>>>>
>>>>>> Re: remote signing, I agree that it does not look like a server
>>>>>> capability that a client can / should discover. It is more like something
>>>>>> that the server instructs / configures the client to do.
>>>>>
>>>>>
>>>>> While a server can control this behavior and instruct the client to
>>>>> use remote signing, technically nothing is preventing a client from
>>>>> configuring s3.remote-signing-enabled=true. In such a case it seems
>>>>> more appropriate to indicate that this capability isn't supported rather
>>>>> than a generic 501, because not every server will support remote signing.
>>>>>
>>>>> The *vended-credentials* capability on the other hand is more
>>>>> informative in its nature and a server indeed configures a client. I think
>>>>> that was also one of the reasons I removed this capability but added it
>>>>> later back due to a comment from Jack.
>>>>>
>>>>> I'm ok either way in terms of removing / keeping *vended-credentials*
>>>>> as a capability but given that we'd want to include *actionable* 
>>>>> capabilities
>>>>> at this point, I'd just remove it (nothing is preventing us from adding it
>>>>> later if necessary).
>>>>>
>>>>>
>>>>> In that case, why do we need all these other capabilities like tables,
>>>>>> remote-signing, etc. in the first place?
>>>>>
>>>>>
>>>>> Given that capabilities also carry versioning information, clients can
>>>>> make more informed decisions on which endpoints to call. One could argue
>>>>> that generally throwing a 501 on everything that isn't supported might be
>>>>> sufficient, but that doesn't necessarily help a client in knowing which
>>>>> versions of a capability are safe to call/use.
>>>>>
>>>>> Regarding the control of client-side fallback behavior:
>>>>> I think the default fallback behavior should be *tables* (with
>>>>> version 1) with a property in the REST catalog that allows configuring 
>>>>> this
>>>>> to e.g. *rest-default-capabilities=tables,views,abc,xyz* (all of them
>>>>> defaulting to version 1).
>>>>>
>>>>>
>>>>> Eduard
>>>>>
>>>>>
>>>>> On Tue, Jul 9, 2024 at 7:00 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>>>
>>>>>> Yes I agree that sounds like a valid use case. So the criteria so far
>>>>>> is that capabilities are used for:
>>>>>> - controlling client-side fallback behavior
>>>>>> - failing expensive operations early if we know it will eventually
>>>>>> fail due to missing capability
>>>>>>
>>>>>> Do we agree if this is the criteria we should use? What about the
>>>>>> other capabilities, namly tables, remote-signing, credential-vending?
>>>>>>
>>>>>> -Jack
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue <b...@databricks.com.invalid>
>>>>>> wrote:
>>>>>>
>>>>>>> > does it make a difference if I declare the capability or not?
>>>>>>>
>>>>>>> I think that it does in other cases. Multi-table commits, for
>>>>>>> example, are a building block for multi-statement transactions. If a
>>>>>>> service doesn't support multi-table commits then we ideally want 
>>>>>>> clients to
>>>>>>> know that ahead of time so that they don't run a big transaction and 
>>>>>>> then
>>>>>>> fail because the commit is not supported.
>>>>>>>
>>>>>>> On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov
>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>>
>>>>>>>> Re: remote signing, I agree that it does not look like a server
>>>>>>>> capability that a client can / should discover. It is more like 
>>>>>>>> something
>>>>>>>> that the server instructs / configures the client to do.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Dmitri.
>>>>>>>>
>>>>>>>> On Tue, Jul 9, 2024 at 12:05 PM Jack Ye <yezhao...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I was reconciling the discussion yesterday, one point that was
>>>>>>>>> interesting to me was that we agreed the purpose of these 
>>>>>>>>> capabilities is
>>>>>>>>> to "control client-side fallback behavior", or at least the client 
>>>>>>>>> should
>>>>>>>>> behave differently based on these capabilities. However, this seems 
>>>>>>>>> to be
>>>>>>>>> only needed so far for views, or more specifically, for loadView API 
>>>>>>>>> only
>>>>>>>>> because it impacts the fallback behavior to resolve the identifier as 
>>>>>>>>> a
>>>>>>>>> table or not.
>>>>>>>>>
>>>>>>>>> For all the other capabilities listed, and even the other
>>>>>>>>> endpoints in view, because a server can decide to implement it 
>>>>>>>>> partially
>>>>>>>>> anyway and just document the behavior, does it make a difference if I
>>>>>>>>> declare the capability or not? The client will not stop the request, 
>>>>>>>>> the
>>>>>>>>> server will just error out if it is not supported. Maybe the error is 
>>>>>>>>> not
>>>>>>>>> in the expected code or message, but it is still an error. In that 
>>>>>>>>> case,
>>>>>>>>> why do we need all these other capabilities like tables, 
>>>>>>>>> remote-signing,
>>>>>>>>> etc. in the first place?
>>>>>>>>>
>>>>>>>>> Maybe it is too extreme of a thought, but could anyone help
>>>>>>>>> describe how the other capabilities could be used beyond potentially
>>>>>>>>> returning an error earlier?
>>>>>>>>>
>>>>>>>>> -Jack
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jul 9, 2024 at 8:02 AM Dmitri Bourlatchkov
>>>>>>>>> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Eduard,
>>>>>>>>>>
>>>>>>>>>> > I've also added the 501 error to the response of the respective
>>>>>>>>>> endpoints but worth mentioning that *HEAD* / *GET *requests must
>>>>>>>>>> not return a 501
>>>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
>>>>>>>>>> implies that the server impl would e.g. return a *404* in such a
>>>>>>>>>> case).
>>>>>>>>>>
>>>>>>>>>> My reading on the Mozilla page makes me think that it is phrased
>>>>>>>>>> too narrowly. Reading RFC 2616 [1] I believe that it does not 
>>>>>>>>>> preclude
>>>>>>>>>> responding with 501 to GET and HEAD requests. I think it means that 
>>>>>>>>>> GET and
>>>>>>>>>> HEAD methods must be supported by "general purpose" servers. The 
>>>>>>>>>> Iceberg
>>>>>>>>>> REST server is not a general purpose server for resources. So, I 
>>>>>>>>>> think it
>>>>>>>>>> should be fine to respond with 501 to unimplemented endpoints.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Dmitri.
>>>>>>>>>>
>>>>>>>>>> [1] https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 9, 2024 at 9:44 AM Eduard Tudenhöfner <
>>>>>>>>>> etudenhoef...@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey everyone,
>>>>>>>>>>>
>>>>>>>>>>> I watched the catalog sync recording today and updated the PR
>>>>>>>>>>> <https://github.com/apache/iceberg/pull/9940> to remove
>>>>>>>>>>> fine-grained capabilities like *register-table / table-metrics*.
>>>>>>>>>>>
>>>>>>>>>>> The current capabilities (with versioning information) in the PR
>>>>>>>>>>> are:
>>>>>>>>>>>
>>>>>>>>>>>    - tables
>>>>>>>>>>>    - views
>>>>>>>>>>>    - remote-signing
>>>>>>>>>>>    - vended-credentials
>>>>>>>>>>>    - multi-table-commit
>>>>>>>>>>>
>>>>>>>>>>> For servers that only *partially* implement endpoints under a
>>>>>>>>>>> capability the spec requires the server to throw a *501 Not
>>>>>>>>>>> Implemented*. I've also added the 501 error to the response of
>>>>>>>>>>> the respective endpoints but worth mentioning that *HEAD* / *GET
>>>>>>>>>>> *requests must not return a 501
>>>>>>>>>>> <https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
>>>>>>>>>>> implies that the server impl would e.g. return a *404* in such
>>>>>>>>>>> a case).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>> Eduard
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jul 4, 2024 at 3:59 PM Jean-Baptiste Onofré <
>>>>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Eduard,
>>>>>>>>>>>>
>>>>>>>>>>>> It makes sense to return 501 for servers which don't implement
>>>>>>>>>>>> all
>>>>>>>>>>>> endpoints. It means that the server will at least have to
>>>>>>>>>>>> implement
>>>>>>>>>>>> empty endpoints if needed (that makes sense to me).
>>>>>>>>>>>>
>>>>>>>>>>>> I think we should focus on only "identified capabilities". I
>>>>>>>>>>>> think
>>>>>>>>>>>> that I proposed before that the capabilities can be
>>>>>>>>>>>> overridden/provided by server implementation. Else, I'm afraid
>>>>>>>>>>>> we
>>>>>>>>>>>> won't be flexible enough or always behind the implementation
>>>>>>>>>>>> (if an
>>>>>>>>>>>> implementation wants to add "my-foo-cap").
>>>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>> JB
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jul 4, 2024 at 9:32 AM Eduard Tudenhöfner
>>>>>>>>>>>> <etudenhoef...@apache.org> wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> > I have clarified the wording in #9940 around the requirement
>>>>>>>>>>>> on having to implement all endpoints under a particular capability.
>>>>>>>>>>>> >
>>>>>>>>>>>> > For servers that only partially implement endpoints under a
>>>>>>>>>>>> capability the spec requires the server to throw a 501 Not 
>>>>>>>>>>>> Implemented.
>>>>>>>>>>>> This was suggested by Jack and it seems reasonable to do that.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Regarding the inclusion of table-spec / view-spec as a
>>>>>>>>>>>> capability: I think this might make sense for the next iteration 
>>>>>>>>>>>> of the
>>>>>>>>>>>> REST spec but as I mentioned earlier I don't see any clear benefit 
>>>>>>>>>>>> for the
>>>>>>>>>>>> current REST spec as the client wouldn't do anything with that 
>>>>>>>>>>>> information.
>>>>>>>>>>>> > If there is a clear benefit of having this, then this can
>>>>>>>>>>>> still be added later to the current REST spec but I believe we 
>>>>>>>>>>>> should
>>>>>>>>>>>> rather have a few well-defined and actionable capabilities rather 
>>>>>>>>>>>> than too
>>>>>>>>>>>> many.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Eduard
>>>>>>>>>>>> >
>>>>>>>>>>>> > On Wed, Jul 3, 2024 at 5:44 AM Renjie Liu <
>>>>>>>>>>>> liurenjie2...@gmail.com> wrote:
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert,
>>>>>>>>>>>> how do you envision this to be used?
>>>>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there are
>>>>>>>>>>>> 2 approaches we can go with, taking CreateTable as an example:
>>>>>>>>>>>> >>> (1) increment the related operation version, which means
>>>>>>>>>>>> that POST /v2/{prefix}/namespaces/{ns}/tables will be created and 
>>>>>>>>>>>> allow
>>>>>>>>>>>> creating tables in the v3 version.
>>>>>>>>>>>> >>> (2) update the existing table metadata model to support
>>>>>>>>>>>> both v2 and v3 fields, and the server enforces the payload 
>>>>>>>>>>>> differently
>>>>>>>>>>>> based on the TableMetadata.format-version field. If the server 
>>>>>>>>>>>> does not
>>>>>>>>>>>> support v3, it can return unsupported at that time.
>>>>>>>>>>>> >>> Either way we go, the table-spec version does not need to
>>>>>>>>>>>> be a capability. (1) seems to be cleaner, but has some overhead in
>>>>>>>>>>>> provisioning a new endpoint compared to (2).
>>>>>>>>>>>> >>> Do you see another way to do this leveraging the table-spec
>>>>>>>>>>>> version?
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> 2 is cleaner but maybe inconsistent with current behavior,
>>>>>>>>>>>> since /v1/tables operation supports both v1 and v3. We should only 
>>>>>>>>>>>> go to 2
>>>>>>>>>>>> only when we have incompatible fields/break changes according to 
>>>>>>>>>>>> discussion.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Generally I agree with adding table-spec into capabilities.
>>>>>>>>>>>> For example, we can expose this to user in api so that user could 
>>>>>>>>>>>> choose a
>>>>>>>>>>>> supported table format version without throwing exception.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> On Wed, Jul 3, 2024 at 12:18 AM Jack Ye <yezhao...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Spec is an interesting topic we did not discuss. Robert,
>>>>>>>>>>>> how do you envision this to be used?
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> In my mind, if a new table format v3 is launched, there are
>>>>>>>>>>>> 2 approaches we can go with, taking CreateTable as an example:
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> (1) increment the related operation version, which means
>>>>>>>>>>>> that POST /v2/{prefix}/namespaces/{ns}/tables will be created and 
>>>>>>>>>>>> allow
>>>>>>>>>>>> creating tables in the v3 version.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> (2) update the existing table metadata model to support
>>>>>>>>>>>> both v2 and v3 fields, and the server enforces the payload 
>>>>>>>>>>>> differently
>>>>>>>>>>>> based on the TableMetadata.format-version field. If the server 
>>>>>>>>>>>> does not
>>>>>>>>>>>> support v3, it can return unsupported at that time.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Either way we go, the table-spec version does not need to
>>>>>>>>>>>> be a capability. (1) seems to be cleaner, but has some overhead in
>>>>>>>>>>>> provisioning a new endpoint compared to (2).
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Do you see another way to do this leveraging the table-spec
>>>>>>>>>>>> version?
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> -Jack
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> On Tue, Jul 2, 2024 at 6:03 AM Eduard Tudenhöfner
>>>>>>>>>>>> <eduard.tudenhoef...@databricks.com.invalid> wrote:
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> I couldn't make it to the catalog sync meeting yesterday
>>>>>>>>>>>> but I watched the recording today (thanks for providing that).
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>>> The missing piece is how (new, capabilities-aware)
>>>>>>>>>>>> clients handle the case when a service does _not_ return the 
>>>>>>>>>>>> capabilities
>>>>>>>>>>>> field (absent). My proposal would be that a client should in this 
>>>>>>>>>>>> case
>>>>>>>>>>>> assume that all _currently_ existing capabilities are supported.
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>> - tables: [1]
>>>>>>>>>>>> >>>>> - views: [1]
>>>>>>>>>>>> >>>>> - remote-signing: [1]
>>>>>>>>>>>> >>>>> - multi-table-commit: [1]
>>>>>>>>>>>> >>>>> - register-table: [1]
>>>>>>>>>>>> >>>>> - table-metrics: [1]
>>>>>>>>>>>> >>>>> - table-spec: [1,2]
>>>>>>>>>>>> >>>>> - view-spec: [1,2]
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>>>
>>>>>>>>>>>> >>>> The one thing I would like to add here is that the current
>>>>>>>>>>>> PR uses the tables capability (as version 1) as the default when a 
>>>>>>>>>>>> server
>>>>>>>>>>>> doesn't return capabilities but it might be also ok to include 
>>>>>>>>>>>> views (as
>>>>>>>>>>>> version 1) because the current client impl has some code to deal 
>>>>>>>>>>>> with
>>>>>>>>>>>> errors in case endpoints don't exist.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> Unless we agree that the currently existing functionality
>>>>>>>>>>>> in the REST spec is the default behavior to be assumed for older 
>>>>>>>>>>>> server,
>>>>>>>>>>>> I'm not sure about including remote-signing / multi-table-commit /
>>>>>>>>>>>> register-table / table-metrics as it has been indicated in earlier 
>>>>>>>>>>>> comments
>>>>>>>>>>>> on the PR/ML that not every REST server supports these.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> That being said, we should discuss whether we want the
>>>>>>>>>>>> default behavior (when an older server doesn't send back 
>>>>>>>>>>>> capabilities) to be
>>>>>>>>>>>> >>>> a) tables (version 1) only
>>>>>>>>>>>> >>>> b) the currently existing functionality as defined in the
>>>>>>>>>>>> REST spec (as version 1)
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> On another note: Including table-spec / view-spec seems to
>>>>>>>>>>>> be more informative in its nature as I don't think a client would 
>>>>>>>>>>>> act
>>>>>>>>>>>> differently right now when seeing these.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> Thanks
>>>>>>>>>>>> >>>> Eduard
>>>>>>>>>>>> >>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ryan Blue
>>>>>>> Databricks
>>>>>>>
>>>>>>

Reply via email to