Are there any other concerns with the proposal or should we start a
VOTE thread?
Eduard
On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
<dmitri.bourlatch...@dremio.com.invalid> wrote:
Re: remote signing, I agree that it does not look like a
server capability that a client can / should discover. It
is more like something that the server instructs /
configures the client to do.
While a server can control this behavior and instruct the
client to use remote signing, technically nothing is
preventing a client from configuring
s3.remote-signing-enabled=true. In such a case it seems more
appropriate to indicate that this capability isn't supported
rather than a generic 501, because not every server will
support remote signing.
Good point regarding clients taking initiative and using request
singing without an explicit server-provided config. It moves the
client operations into a mode where the server has more control
(over having longer term client-side credentials), so it looks
like a reasonable mode to support from the security perspective.
Let's keep that capability flag.
Cheers,
Dmitri.
On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner
<etudenhoef...@apache.org> wrote:
Hey everyone,
I've added a few inline comments below.
Re: remote signing, I agree that it does not look like a
server capability that a client can / should discover. It
is more like something that the server instructs /
configures the client to do.
While a server can control this behavior and instruct the
client to use remote signing, technically nothing is
preventing a client from configuring
s3.remote-signing-enabled=true. In such a case it seems more
appropriate to indicate that this capability isn't supported
rather than a generic 501, because not every server will
support remote signing.
The *vended-credentials* capability on the other hand is more
informative in its nature and a server indeed configures a
client. I think that was also one of the reasons I removed
this capability but added it later back due to a comment from
Jack.
I'm ok either way in terms of removing / keeping
*vended-credentials* as a capability but given that we'd want
to include *actionable* capabilities at this point, I'd just
remove it (nothing is preventing us from adding it later if
necessary).
In that case, why do we need all these other capabilities
like tables, remote-signing, etc. in the first place?
Given that capabilities also carry versioning information,
clients can make more informed decisions on which endpoints to
call. One could argue that generally throwing a 501 on
everything that isn't supported might be sufficient, but that
doesn't necessarily help a client in knowing which versions of
a capability are safe to call/use.
Regarding the control of client-side fallback behavior:
I think the default fallback behavior should be *tables* (with
version 1) with a property in the REST catalog that allows
configuring this to e.g.
*rest-default-capabilities=tables,views,abc,xyz* (all of them
defaulting to version 1).
Eduard
On Tue, Jul 9, 2024 at 7:00 PM Jack Ye <yezhao...@gmail.com>
wrote:
Yes I agree that sounds like a valid use case. So the
criteria so far is that capabilities are used for:
- controlling client-side fallback behavior
- failing expensive operations early if we know it will
eventually fail due to missing capability
Do we agree if this is the criteria we should use? What
about the other capabilities, namly tables,
remote-signing, credential-vending?
-Jack
On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue
<b...@databricks.com.invalid> wrote:
> does it make a difference if I declare the
capability or not?
I think that it does in other cases. Multi-table
commits, for example, are a building block for
multi-statement transactions. If a service doesn't
support multi-table commits then we ideally want
clients to know that ahead of time so that they don't
run a big transaction and then fail because the commit
is not supported.
On Tue, Jul 9, 2024 at 9:12 AM Dmitri Bourlatchkov
<dmitri.bourlatch...@dremio.com.invalid> wrote:
Re: remote signing, I agree that it does not look
like a server capability that a client can /
should discover. It is more like something that
the server instructs / configures the client to do.
Cheers,
Dmitri.
On Tue, Jul 9, 2024 at 12:05 PM Jack Ye
<yezhao...@gmail.com> wrote:
I was reconciling the discussion yesterday,
one point that was interesting to me was that
we agreed the purpose of these capabilities is
to "control client-side fallback behavior", or
at least the client should behave differently
based on these capabilities. However, this
seems to be only needed so far for views, or
more specifically, for loadView API only
because it impacts the fallback behavior to
resolve the identifier as a table or not.
For all the other capabilities listed, and
even the other endpoints in view, because a
server can decide to implement it partially
anyway and just document the behavior, does it
make a difference if I declare the capability
or not? The client will not stop the request,
the server will just error out if it is not
supported. Maybe the error is not in the
expected code or message, but it is still an
error. In that case, why do we need all these
other capabilities like tables,
remote-signing, etc. in the first place?
Maybe it is too extreme of a thought, but
could anyone help describe how the other
capabilities could be used beyond potentially
returning an error earlier?
-Jack
On Tue, Jul 9, 2024 at 8:02 AM Dmitri
Bourlatchkov
<dmitri.bourlatch...@dremio.com.invalid> wrote:
Hi Eduard,
> I've also added the 501 error to the
response of the respective endpoints but
worth mentioning that *HEAD* / *GET
*requests must not return a 501
<https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
implies that the server impl would e.g.
return a *404* in such a case).
My reading on the Mozilla page makes me
think that it is phrased too narrowly.
Reading RFC 2616 [1] I believe that it
does not preclude responding with 501 to
GET and HEAD requests. I think it means
that GET and HEAD methods must be
supported by "general purpose" servers.
The Iceberg REST server is not a general
purpose server for resources. So, I think
it should be fine to respond with 501 to
unimplemented endpoints.
Cheers,
Dmitri.
[1]
https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1
On Tue, Jul 9, 2024 at 9:44 AM Eduard
Tudenhöfner <etudenhoef...@apache.org> wrote:
Hey everyone,
I watched the catalog sync recording
today and updated the PR
<https://github.com/apache/iceberg/pull/9940>
to remove fine-grained capabilities
like *register-table / table-metrics*.
The current capabilities (with
versioning information) in the PR are:
* tables
* views
* remote-signing
* vended-credentials
* multi-table-commit
For servers that only
*partially* implement endpoints under
a capability the spec requires the
server to throw a *501 Not
Implemented*. I've also added the 501
error to the response of the
respective endpoints but worth
mentioning that *HEAD* / *GET
*requests must not return a 501
<https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
implies that the server impl would
e.g. return a *404* in such a case).
Regards
Eduard
On Thu, Jul 4, 2024 at 3:59 PM
Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:
Hi Eduard,
It makes sense to return 501 for
servers which don't implement all
endpoints. It means that the
server will at least have to implement
empty endpoints if needed (that
makes sense to me).
I think we should focus on only
"identified capabilities". I think
that I proposed before that the
capabilities can be
overridden/provided by server
implementation. Else, I'm afraid we
won't be flexible enough or always
behind the implementation (if an
implementation wants to add
"my-foo-cap").
Regards
JB
On Thu, Jul 4, 2024 at 9:32 AM
Eduard Tudenhöfner
<etudenhoef...@apache.org> wrote:
>
> I have clarified the wording in
#9940 around the requirement on
having to implement all endpoints
under a particular capability.
>
> For servers that only partially
implement endpoints under a
capability the spec requires the
server to throw a 501 Not
Implemented. This was suggested by
Jack and it seems reasonable to do
that.
>
> Regarding the inclusion of
table-spec / view-spec as a
capability: I think this might
make sense for the next iteration
of the REST spec but as I
mentioned earlier I don't see any
clear benefit for the current REST
spec as the client wouldn't do
anything with that information.
> If there is a clear benefit of
having this, then this can still
be added later to the current REST
spec but I believe we should
rather have a few well-defined and
actionable capabilities rather
than too many.
>
> Eduard
>
> On Wed, Jul 3, 2024 at 5:44 AM
Renjie Liu
<liurenjie2...@gmail.com> wrote:
>>>
>>> Spec is an interesting topic
we did not discuss. Robert, how do
you envision this to be used?
>>> In my mind, if a new table
format v3 is launched, there are 2
approaches we can go with, taking
CreateTable as an example:
>>> (1) increment the related
operation version, which means
that POST
/v2/{prefix}/namespaces/{ns}/tables
will be created and allow creating
tables in the v3 version.
>>> (2) update the existing table
metadata model to support both v2
and v3 fields, and the server
enforces the payload differently
based on the
TableMetadata.format-version
field. If the server does not
support v3, it can return
unsupported at that time.
>>> Either way we go, the
table-spec version does not need
to be a capability. (1) seems to
be cleaner, but has some overhead
in provisioning a new endpoint
compared to (2).
>>> Do you see another way to do
this leveraging the table-spec
version?
>>
>>
>> 2 is cleaner but maybe
inconsistent with current
behavior, since /v1/tables
operation supports both v1 and v3.
We should only go to 2 only when
we have incompatible fields/break
changes according to discussion.
>>
>> Generally I agree with adding
table-spec into capabilities. For
example, we can expose this to
user in api so that user could
choose a supported table format
version without throwing exception.
>>
>> On Wed, Jul 3, 2024 at 12:18 AM
Jack Ye <yezhao...@gmail.com> wrote:
>>>
>>> Spec is an interesting topic
we did not discuss. Robert, how do
you envision this to be used?
>>>
>>> In my mind, if a new table
format v3 is launched, there are 2
approaches we can go with, taking
CreateTable as an example:
>>>
>>> (1) increment the related
operation version, which means
that POST
/v2/{prefix}/namespaces/{ns}/tables
will be created and allow creating
tables in the v3 version.
>>>
>>> (2) update the existing table
metadata model to support both v2
and v3 fields, and the server
enforces the payload differently
based on the
TableMetadata.format-version
field. If the server does not
support v3, it can return
unsupported at that time.
>>>
>>> Either way we go, the
table-spec version does not need
to be a capability. (1) seems to
be cleaner, but has some overhead
in provisioning a new endpoint
compared to (2).
>>>
>>> Do you see another way to do
this leveraging the table-spec
version?
>>>
>>> -Jack
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jul 2, 2024 at 6:03 AM
Eduard Tudenhöfner
<eduard.tudenhoef...@databricks.com.invalid>
wrote:
>>>>
>>>>
>>>> I couldn't make it to the
catalog sync meeting yesterday but
I watched the recording today
(thanks for providing that).
>>>>
>>>>> The missing piece is how
(new, capabilities-aware) clients
handle the case when a service
does _not_ return the capabilities
field (absent). My proposal would
be that a client should in this
case assume that all _currently_
existing capabilities are supported.
>>>>>
>>>>> - tables: [1]
>>>>> - views: [1]
>>>>> - remote-signing: [1]
>>>>> - multi-table-commit: [1]
>>>>> - register-table: [1]
>>>>> - table-metrics: [1]
>>>>> - table-spec: [1,2]
>>>>> - view-spec: [1,2]
>>>>>
>>>>>
>>>> The one thing I would like to
add here is that the current PR
uses the tables capability (as
version 1) as the default when a
server doesn't return capabilities
but it might be also ok to include
views (as version 1) because the
current client impl has some code
to deal with errors in case
endpoints don't exist.
>>>>
>>>> Unless we agree that the
currently existing functionality
in the REST spec is the default
behavior to be assumed for older
server, I'm not sure about
including remote-signing /
multi-table-commit /
register-table / table-metrics as
it has been indicated in earlier
comments on the PR/ML that not
every REST server supports these.
>>>>
>>>> That being said, we should
discuss whether we want the
default behavior (when an older
server doesn't send back
capabilities) to be
>>>> a) tables (version 1) only
>>>> b) the currently existing
functionality as defined in the
REST spec (as version 1)
>>>>
>>>>
>>>> On another note: Including
table-spec / view-spec seems to be
more informative in its nature as
I don't think a client would act
differently right now when seeing
these.
>>>>
>>>> Thanks
>>>> Eduard
>>>>
--
Ryan Blue
Databricks