Are there any other concerns with the proposal or should we
start a VOTE thread?
Eduard
On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov
<dmitri.bourlatch...@dremio.com.invalid>
<mailto:dmitri.bourlatch...@dremio.com.invalid> wrote:
Re: remote signing, I agree that it does not
look like a server capability that a client can
/ should discover. It is more like something
that the server instructs / configures the
client to do.
While a server can control this behavior and
instruct the client to use remote signing,
technically nothing is preventing a client from
configuring s3.remote-signing-enabled=true. In such
a case it seems more appropriate to indicate that
this capability isn't supported rather than a
generic 501, because not every server will support
remote signing.
Good point regarding clients taking initiative and using
request singing without an explicit server-provided
config. It moves the client operations into a mode where
the server has more control (over having longer term
client-side credentials), so it looks like a reasonable
mode to support from the security perspective.
Let's keep that capability flag.
Cheers,
Dmitri.
On Wed, Jul 10, 2024 at 5:48 AM Eduard Tudenhöfner
<etudenhoef...@apache.org> wrote:
Hey everyone,
I've added a few inline comments below.
Re: remote signing, I agree that it does not
look like a server capability that a client can
/ should discover. It is more like something
that the server instructs / configures the
client to do.
While a server can control this behavior and
instruct the client to use remote signing,
technically nothing is preventing a client from
configuring s3.remote-signing-enabled=true. In such
a case it seems more appropriate to indicate that
this capability isn't supported rather than a
generic 501, because not every server will support
remote signing.
The *vended-credentials* capability on the other
hand is more informative in its nature and a server
indeed configures a client. I think that was also
one of the reasons I removed this capability but
added it later back due to a comment from Jack.
I'm ok either way in terms of removing / keeping
*vended-credentials* as a capability but given that
we'd want to include *actionable* capabilities at
this point, I'd just remove it (nothing is
preventing us from adding it later if necessary).
In that case, why do we need all these other
capabilities like tables, remote-signing, etc.
in the first place?
Given that capabilities also carry versioning
information, clients can make more informed
decisions on which endpoints to call. One could
argue that generally throwing a 501 on everything
that isn't supported might be sufficient, but that
doesn't necessarily help a client in knowing which
versions of a capability are safe to call/use.
Regarding the control of client-side fallback behavior:
I think the default fallback behavior should be
*tables* (with version 1) with a property in the
REST catalog that allows configuring this to e.g.
*rest-default-capabilities=tables,views,abc,xyz*
(all of them defaulting to version 1).
Eduard
On Tue, Jul 9, 2024 at 7:00 PM Jack Ye
<yezhao...@gmail.com> wrote:
Yes I agree that sounds like a valid use case.
So the criteria so far is that capabilities are
used for:
- controlling client-side fallback behavior
- failing expensive operations early if we know
it will eventually fail due to missing capability
Do we agree if this is the criteria we should
use? What about the other capabilities, namly
tables, remote-signing, credential-vending?
-Jack
On Tue, Jul 9, 2024 at 9:27 AM Ryan Blue
<b...@databricks.com.invalid>
<mailto:b...@databricks.com.invalid> wrote:
> does it make a difference if I declare the
capability or not?
I think that it does in other cases.
Multi-table commits, for example, are a
building block for multi-statement
transactions. If a service doesn't support
multi-table commits then we ideally want
clients to know that ahead of time so that
they don't run a big transaction and then
fail because the commit is not supported.
On Tue, Jul 9, 2024 at 9:12 AM Dmitri
Bourlatchkov
<dmitri.bourlatch...@dremio.com.invalid>
<mailto:dmitri.bourlatch...@dremio.com.invalid>
wrote:
Re: remote signing, I agree that it does
not look like a server capability that a
client can / should discover. It is more
like something that the server instructs
/ configures the client to do.
Cheers,
Dmitri.
On Tue, Jul 9, 2024 at 12:05 PM Jack Ye
<yezhao...@gmail.com> wrote:
I was reconciling the discussion
yesterday, one point that was
interesting to me was that we agreed
the purpose of these capabilities is
to "control client-side fallback
behavior", or at least the client
should behave differently based on
these capabilities. However, this
seems to be only needed so far for
views, or more specifically, for
loadView API only because it impacts
the fallback behavior to resolve the
identifier as a table or not.
For all the other capabilities
listed, and even the other endpoints
in view, because a server can decide
to implement it partially anyway and
just document the behavior, does it
make a difference if I declare the
capability or not? The client will
not stop the request, the server
will just error out if it is not
supported. Maybe the error is not in
the expected code or message, but it
is still an error. In that case, why
do we need all these other
capabilities like tables,
remote-signing, etc. in the first place?
Maybe it is too extreme of a
thought, but could anyone help
describe how the other capabilities
could be used beyond potentially
returning an error earlier?
-Jack
On Tue, Jul 9, 2024 at 8:02 AM
Dmitri Bourlatchkov
<dmitri.bourlatch...@dremio.com.invalid>
<mailto:dmitri.bourlatch...@dremio.com.invalid>
wrote:
Hi Eduard,
> I've also added the 501 error
to the response of the
respective endpoints but worth
mentioning that *HEAD* / *GET
*requests must not return a 501
<https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
implies that the server impl
would e.g. return a *404* in
such a case).
My reading on the Mozilla page
makes me think that it is
phrased too narrowly. Reading
RFC 2616 [1] I believe that it
does not preclude responding
with 501 to GET and HEAD
requests. I think it means that
GET and HEAD methods must be
supported by "general purpose"
servers. The Iceberg REST server
is not a general purpose server
for resources. So, I think it
should be fine to respond with
501 to unimplemented endpoints.
Cheers,
Dmitri.
[1]
https://www.rfc-editor.org/rfc/rfc2616#section-5.1.1
On Tue, Jul 9, 2024 at 9:44 AM
Eduard Tudenhöfner
<etudenhoef...@apache.org> wrote:
Hey everyone,
I watched the catalog sync
recording today and updated
the PR
<https://github.com/apache/iceberg/pull/9940>
to remove fine-grained
capabilities like
*register-table /
table-metrics*.
The current capabilities
(with versioning
information) in the PR are:
* tables
* views
* remote-signing
* vended-credentials
* multi-table-commit
For servers that only
*partially* implement
endpoints under a capability
the spec requires the server
to throw a *501 Not
Implemented*. I've also
added the 501 error to the
response of the respective
endpoints but worth
mentioning that *HEAD* /
*GET *requests must not
return a 501
<https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/501> (this
implies that the server impl
would e.g. return a *404* in
such a case).
Regards
Eduard
On Thu, Jul 4, 2024 at
3:59 PM Jean-Baptiste Onofré
<j...@nanthrax.net> wrote:
Hi Eduard,
It makes sense to return
501 for servers which
don't implement all
endpoints. It means that
the server will at least
have to implement
empty endpoints if
needed (that makes sense
to me).
I think we should focus
on only "identified
capabilities". I think
that I proposed before
that the capabilities can be
overridden/provided by
server implementation.
Else, I'm afraid we
won't be flexible enough
or always behind the
implementation (if an
implementation wants to
add "my-foo-cap").
Regards
JB
On Thu, Jul 4, 2024 at
9:32 AM Eduard Tudenhöfner
<etudenhoef...@apache.org>
wrote:
>
> I have clarified the
wording in #9940 around
the requirement on
having to implement all
endpoints under a
particular capability.
>
> For servers that only
partially implement
endpoints under a
capability the spec
requires the server to
throw a 501 Not
Implemented. This was
suggested by Jack and it
seems reasonable to do that.
>
> Regarding the
inclusion of table-spec
/ view-spec as a
capability: I think this
might make sense for the
next iteration of the
REST spec but as I
mentioned earlier I
don't see any clear
benefit for the current
REST spec as the client
wouldn't do anything
with that information.
> If there is a clear
benefit of having this,
then this can still be
added later to the
current REST spec but I
believe we should rather
have a few well-defined
and actionable
capabilities rather than
too many.
>
> Eduard
>
> On Wed, Jul 3, 2024 at
5:44 AM Renjie Liu
<liurenjie2...@gmail.com>
wrote:
>>>
>>> Spec is an
interesting topic we did
not discuss. Robert, how
do you envision this to
be used?
>>> In my mind, if a new
table format v3 is
launched, there are 2
approaches we can go
with, taking CreateTable
as an example:
>>> (1) increment the
related operation
version, which means
that POST
/v2/{prefix}/namespaces/{ns}/tables
will be created and
allow creating tables in
the v3 version.
>>> (2) update the
existing table metadata
model to support both v2
and v3 fields, and the
server enforces the
payload differently
based on the
TableMetadata.format-version
field. If the server
does not support v3, it
can return unsupported
at that time.
>>> Either way we go,
the table-spec version
does not need to be a
capability. (1) seems to
be cleaner, but has some
overhead in provisioning
a new endpoint compared
to (2).
>>> Do you see another
way to do this
leveraging the
table-spec version?
>>
>>
>> 2 is cleaner but
maybe inconsistent with
current behavior, since
/v1/tables operation
supports both v1 and v3.
We should only go to 2
only when we have
incompatible
fields/break changes
according to discussion.
>>
>> Generally I agree
with adding table-spec
into capabilities. For
example, we can expose
this to user in api so
that user could choose a
supported table format
version without throwing
exception.
>>
>> On Wed, Jul 3, 2024
at 12:18 AM Jack Ye
<yezhao...@gmail.com> wrote:
>>>
>>> Spec is an
interesting topic we did
not discuss. Robert, how
do you envision this to
be used?
>>>
>>> In my mind, if a new
table format v3 is
launched, there are 2
approaches we can go
with, taking CreateTable
as an example:
>>>
>>> (1) increment the
related operation
version, which means
that POST
/v2/{prefix}/namespaces/{ns}/tables
will be created and
allow creating tables in
the v3 version.
>>>
>>> (2) update the
existing table metadata
model to support both v2
and v3 fields, and the
server enforces the
payload differently
based on the
TableMetadata.format-version
field. If the server
does not support v3, it
can return unsupported
at that time.
>>>
>>> Either way we go,
the table-spec version
does not need to be a
capability. (1) seems to
be cleaner, but has some
overhead in provisioning
a new endpoint compared
to (2).
>>>
>>> Do you see another
way to do this
leveraging the
table-spec version?
>>>
>>> -Jack
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jul 2, 2024
at 6:03 AM Eduard
Tudenhöfner
<eduard.tudenhoef...@databricks.com.invalid>
<mailto:eduard.tudenhoef...@databricks.com.invalid>
wrote:
>>>>
>>>>
>>>> I couldn't make it
to the catalog sync
meeting yesterday but I
watched the recording
today (thanks for
providing that).
>>>>
>>>>> The missing piece
is how (new,
capabilities-aware)
clients handle the case
when a service does
_not_ return the
capabilities field
(absent). My proposal
would be that a client
should in this case
assume that all
_currently_ existing
capabilities are supported.
>>>>>
>>>>> - tables: [1]
>>>>> - views: [1]
>>>>> - remote-signing: [1]
>>>>> -
multi-table-commit: [1]
>>>>> - register-table: [1]
>>>>> - table-metrics: [1]
>>>>> - table-spec: [1,2]
>>>>> - view-spec: [1,2]
>>>>>
>>>>>
>>>> The one thing I
would like to add here
is that the current PR
uses the tables
capability (as version
1) as the default when a
server doesn't return
capabilities but it
might be also ok to
include views (as
version 1) because the
current client impl has
some code to deal with
errors in case endpoints
don't exist.
>>>>
>>>> Unless we agree
that the currently
existing functionality
in the REST spec is the
default behavior to be
assumed for older
server, I'm not sure
about including
remote-signing /
multi-table-commit /
register-table /
table-metrics as it has
been indicated in
earlier comments on the
PR/ML that not every
REST server supports these.
>>>>
>>>> That being said, we
should discuss whether
we want the default
behavior (when an older
server doesn't send back
capabilities) to be
>>>> a) tables (version
1) only
>>>> b) the currently
existing functionality
as defined in the REST
spec (as version 1)
>>>>
>>>>
>>>> On another note:
Including table-spec /
view-spec seems to be
more informative in its
nature as I don't think
a client would act
differently right now
when seeing these.
>>>>
>>>> Thanks
>>>> Eduard
>>>>
--
Ryan Blue
Databricks