Are there any other concerns with the proposal or should we start
a VOTE thread?
Eduard
On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
<dmitri.bourlatch...@dremio.com.invalid>
<mailto:dmitri.bourlatch...@dremio.com.invalid> wrote:
Re: remote signing, I agree that it does not look
like a server capability that a client can / should
discover. It is more like something that the server
instructs / configures the client to do.
While a server can control this behavior and instruct the
client to use remote signing, technically nothing is
preventing a client from configuring
s3.remote-signing-enabled=true. In such a case it seems
more appropriate to indicate that this capability isn't
supported rather than a generic 501, because not every
server will support remote signing.
Good point regarding clients taking initiative and using
request singing without an explicit server-provided config.
It moves the client operations into a mode where the server
has more control (over having longer term client-side
credentials), so it looks like a reasonable mode to support
from the security perspective.
Let's keep that capability flag.
Cheers,
Dmitri.
On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner
<etudenhoef...@apache.org> wrote:
Hey everyone,
I've added a few inline comments below.
Re: remote signing, I agree that it does not look
like a server capability that a client can / should
discover. It is more like something that the server
instructs / configures the client to do.
While a server can control this behavior and instruct the
client to use remote signing, technically nothing is
preventing a client from configuring
s3.remote-signing-enabled=true. In such a case it seems
more appropriate to indicate that this capability isn't
supported rather than a generic 501, because not every
server will support remote signing.
The *vended-credentials* capability on the other hand is
more informative in its nature and a server indeed
configures a client. I think that was also one of the
reasons I removed this capability but added it later back
due to a comment from Jack.
I'm ok either way in terms of removing / keeping
*vended-credentials* as a capability but given that we'd
want to include *actionable* capabilities at this point,
I'd just remove it (nothing is preventing us from adding
it later if necessary).
In that case, why do we need all these other
capabilities like tables, remote-signing, etc. in the
first place?
Given that capabilities also carry versioning
information, clients can make more informed decisions on
which endpoints to call. One could argue that
generally throwing a 501 on everything that isn't
supported might be sufficient, but that doesn't
necessarily help a client in knowing which versions of a
capability are safe to call/use.
Regarding the control of client-side fallback behavior:
I think the default fallback behavior should be
*tables* (with version 1) with a property in the REST
catalog that allows configuring this to e.g.
*rest-default-capabilities=tables,views,abc,xyz* (all of
them defaulting to version 1).
Eduard
On Tue, Jul 9, 2024 at 7:00 PM Jack Ye
<yezhao...@gmail.com> wrote:
Yes I agree that sounds like a valid use case. So the
criteria so far is that capabilities are used for:
- controlling client-side fallback behavior
- failing expensive operations early if we know it
will eventually fail due to missing capability
Do we agree if this is the criteria we should use?
What about the other capabilities, namly tables,
remote-signing, credential-vending?
-Jack
On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue
<b...@databricks.com.invalid>
<mailto:b...@databricks.com.invalid> wrote:
> does it make a difference if I declare the
capability or not?
I think that it does in other cases. Multi-table
commits, for example, are a building block for
multi-statement transactions. If a service
doesn't support multi-table commits then we
ideally want clients to know that ahead of time
so that they don't run a big transaction and then
fail because the commit is not supported.
On Tue, Jul 9, 2024 at 9:12 AM Dmitri
Bourlatchkov
<dmitri.bourlatch...@dremio.com.invalid>
<mailto:dmitri.bourlatch...@dremio.com.invalid>
wrote:
Re: remote signing, I agree that it does not
look like a server capability that a client
can / should discover. It is more like
something that the server instructs /
configures the client to do.
Cheers,
Dmitri.
On Tue, Jul 9, 2024 at 12:05 PM Jack Ye
<yezhao...@gmail.com> wrote:
I was reconciling the discussion
yesterday, one point that was interesting
to me was that we agreed the purpose of
these capabilities is to "control
client-side fallback behavior", or at
least the client should behave
differently based on these capabilities.
However, this seems to be only needed so
far for views, or more specifically, for
loadView API only because it impacts the
fallback behavior to resolve the
identifier as a table or not.
For all the other capabilities listed,
and even the other endpoints in view,
because a server can decide to implement
it partially anyway and just document the
behavior, does it make a difference if I
declare the capability or not? The client
will not stop the request, the server
will just error out if it is not
supported. Maybe the error is not in the
expected code or message, but it is still
an error. In that case, why do we need
all these other capabilities like tables,
remote-signing, etc. in the first place?
Maybe it is too extreme of a thought, but
could anyone help describe how the other
capabilities could be used beyond
potentially returning an error earlier?
-Jack
On Tue, Jul 9, 2024 at 8:02 AM Dmitri
Bourlatchkov
<dmitri.bourlatch...@dremio.com.invalid>
<mailto:dmitri.bourlatch...@dremio.com.invalid>
wrote:
Hi Eduard,
> I've also added the 501 error to
the response of the respective
endpoints but worth mentioning that
*HEAD* / *GET *requests must not
return a 501
<https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
implies that the server impl would
e.g. return a *404* in such a case).
My reading on the Mozilla page makes
me think that it is phrased too
narrowly. Reading RFC 2616 [1] I
believe that it does not preclude
responding with 501 to GET and HEAD
requests. I think it means that GET
and HEAD methods must be supported by
"general purpose" servers. The
Iceberg REST server is not a general
purpose server for resources. So, I
think it should be fine to respond
with 501 to unimplemented endpoints.
Cheers,
Dmitri.
[1]
https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1
On Tue, Jul 9, 2024 at 9:44 AM Eduard
Tudenhöfner
<etudenhoef...@apache.org> wrote:
Hey everyone,
I watched the catalog sync
recording today and updated the
PR
<https://github.com/apache/iceberg/pull/9940>
to remove fine-grained
capabilities like *register-table
/ table-metrics*.
The current capabilities (with
versioning information) in the PR
are:
* tables
* views
* remote-signing
* vended-credentials
* multi-table-commit
For servers that only
*partially* implement endpoints
under a capability the spec
requires the server to throw a
*501 Not Implemented*. I've also
added the 501 error to the
response of the respective
endpoints but worth
mentioning that *HEAD* / *GET
*requests must not return a 501
<https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
implies that the server impl
would e.g. return a *404* in such
a case).
Regards
Eduard
On Thu, Jul 4, 2024 at 3:59 PM
Jean-Baptiste Onofré
<j...@nanthrax.net> wrote:
Hi Eduard,
It makes sense to return 501
for servers which don't
implement all
endpoints. It means that the
server will at least have to
implement
empty endpoints if needed
(that makes sense to me).
I think we should focus on
only "identified
capabilities". I think
that I proposed before that
the capabilities can be
overridden/provided by server
implementation. Else, I'm
afraid we
won't be flexible enough or
always behind the
implementation (if an
implementation wants to add
"my-foo-cap").
Regards
JB
On Thu, Jul 4, 2024 at
9:32 AM Eduard Tudenhöfner
<etudenhoef...@apache.org> wrote:
>
> I have clarified the
wording in #9940 around the
requirement on having to
implement all endpoints under
a particular capability.
>
> For servers that only
partially implement endpoints
under a capability the spec
requires the server to throw
a 501 Not Implemented. This
was suggested by Jack and it
seems reasonable to do that.
>
> Regarding the inclusion of
table-spec / view-spec as a
capability: I think this
might make sense for the next
iteration of the REST spec
but as I mentioned earlier I
don't see any clear benefit
for the current REST spec as
the client wouldn't do
anything with that information.
> If there is a clear benefit
of having this, then this can
still be added later to the
current REST spec but I
believe we should rather have
a few well-defined and
actionable capabilities
rather than too many.
>
> Eduard
>
> On Wed, Jul 3, 2024 at
5:44 AM Renjie Liu
<liurenjie2...@gmail.com> wrote:
>>>
>>> Spec is an interesting
topic we did not discuss.
Robert, how do you envision
this to be used?
>>> In my mind, if a new
table format v3 is launched,
there are 2 approaches we can
go with, taking CreateTable
as an example:
>>> (1) increment the related
operation version, which
means that POST
/v2/{prefix}/namespaces/{ns}/tables
will be created and allow
creating tables in the v3
version.
>>> (2) update the existing
table metadata model to
support both v2 and v3
fields, and the server
enforces the payload
differently based on the
TableMetadata.format-version
field. If the server does not
support v3, it can return
unsupported at that time.
>>> Either way we go, the
table-spec version does not
need to be a capability. (1)
seems to be cleaner, but has
some overhead in provisioning
a new endpoint compared to (2).
>>> Do you see another way to
do this leveraging the
table-spec version?
>>
>>
>> 2 is cleaner but maybe
inconsistent with current
behavior, since /v1/tables
operation supports both v1
and v3. We should only go to
2 only when we have
incompatible fields/break
changes according to discussion.
>>
>> Generally I agree with
adding table-spec into
capabilities. For example, we
can expose this to user in
api so that user could choose
a supported table format
version without throwing
exception.
>>
>> On Wed, Jul 3, 2024 at
12:18 AM Jack Ye
<yezhao...@gmail.com> wrote:
>>>
>>> Spec is an interesting
topic we did not discuss.
Robert, how do you envision
this to be used?
>>>
>>> In my mind, if a new
table format v3 is launched,
there are 2 approaches we can
go with, taking CreateTable
as an example:
>>>
>>> (1) increment the related
operation version, which
means that POST
/v2/{prefix}/namespaces/{ns}/tables
will be created and allow
creating tables in the v3
version.
>>>
>>> (2) update the existing
table metadata model to
support both v2 and v3
fields, and the server
enforces the payload
differently based on the
TableMetadata.format-version
field. If the server does not
support v3, it can return
unsupported at that time.
>>>
>>> Either way we go, the
table-spec version does not
need to be a capability. (1)
seems to be cleaner, but has
some overhead in provisioning
a new endpoint compared to (2).
>>>
>>> Do you see another way to
do this leveraging the
table-spec version?
>>>
>>> -Jack
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jul 2, 2024 at
6:03 AM Eduard Tudenhöfner
<eduard.tudenhoef...@databricks.com.invalid>
<mailto:eduard.tudenhoef...@databricks.com.invalid>
wrote:
>>>>
>>>>
>>>> I couldn't make it to
the catalog sync meeting
yesterday but I watched the
recording today (thanks for
providing that).
>>>>
>>>>> The missing piece is
how (new, capabilities-aware)
clients handle the case when
a service does _not_ return
the capabilities field
(absent). My proposal would
be that a client should in
this case assume that all
_currently_ existing
capabilities are supported.
>>>>>
>>>>> - tables: [1]
>>>>> - views: [1]
>>>>> - remote-signing: [1]
>>>>> - multi-table-commit: [1]
>>>>> - register-table: [1]
>>>>> - table-metrics: [1]
>>>>> - table-spec: [1,2]
>>>>> - view-spec: [1,2]
>>>>>
>>>>>
>>>> The one thing I would
like to add here is that the
current PR uses the tables
capability (as version 1) as
the default when a server
doesn't return capabilities
but it might be also ok to
include views (as version 1)
because the current client
impl has some code to deal
with errors in case endpoints
don't exist.
>>>>
>>>> Unless we agree that the
currently existing
functionality in the REST
spec is the default behavior
to be assumed for older
server, I'm not sure about
including remote-signing /
multi-table-commit /
register-table /
table-metrics as it has been
indicated in earlier comments
on the PR/ML that not every
REST server supports these.
>>>>
>>>> That being said, we
should discuss whether we
want the default behavior
(when an older server doesn't
send back capabilities) to be
>>>> a) tables (version 1) only
>>>> b) the currently
existing functionality as
defined in the REST spec (as
version 1)
>>>>
>>>>
>>>> On another note:
Including table-spec /
view-spec seems to be more
informative in its nature as
I don't think a client would
act differently right now
when seeing these.
>>>>
>>>> Thanks
>>>> Eduard
>>>>
--
Ryan Blue
Databricks